SlideShare a Scribd company logo
1 of 60
Download to read offline
Evolve your schemas in a better way!
A deep dive into Avro schema compatibility
and Schema Registry
Tim van Baarsen & Kosta Chuturkov
About the Speakers
The Netherlands
-
Amsterdam
Team Dora Romania
-
Bucharest
ING
www.ing.jobs
• 60,000+ employees
• Serve 37+ million customers
• Corporate clients and financial
institutions in over 40
countries
Kafka @ ING
Frontrunners in Kafka since 2014
Running in production:
• 9 years
• 7000+ topics
• Serving 1000+ Development teams
• Self service topic management
Kafka @ ING
Traffic is growing with +10% monthly
0
200.000
400.000
600.000
800.000
1.000.000
1.200.000
1.400.000
1.600.000
1.800.000
2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Messages produced per second (average)
Messages produced per second (average)
What are we going to cover today ?
• Why Schemas?
• What compatibility level to pick?
• What changes can I make when evolving my schemas?
• What options do I have when I need to introduce a breaking change?
• Should we automatically register schemas from our applications?
• How do you generate Java classes from your Avro schemas and you build an
automated test suite (unit tests)
Why schemas?
The only constant in life is change!
-Heraclitus (Greek philosopher)
Why schemas?
The only constant in life is change!
The same applies to your Kafka
events flowing through your
streaming applications.
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Responsibilities:
- subscribe
- deserialization
• key
• value
- heartbeat
Responsibilities:
- send
- serialization
• key
• value
Not responsible for:
• Type checking
• Schema validation *
• Other constraints
Data in a Kafka topic are just stored as bytes!
Deserializer
Serializer
= flow of data
0100101001101
your-topic Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumers and producers are decoupled
at runtime
Kafka
client
Kafka
client
Why schemas?
Consumers and producers are decoupled
at runtime
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
What fields and
types of data can I
expect?
Documentation of
the fields?
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
Some requirements
changed. We need to
introduce a new field
Don’t cause
inconsistency
Keep It compatible!
No disruption my
service
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
We need the schema
the data was written
with to be able to read it
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Don’t send the schema
each time we send data
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Confluent Schema
Registry
Kafka
client
Kafka
client
Kafka
client
Don’t send the schema
each time we send data
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
Kafka
client
Kafka
client
Kafka
client
Serializer Deserializer
Deserializer
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
Register Schema
Kafka
client
Kafka
client
Kafka
client
Deserializer
Deserializer
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
id: 1 id: 1 id: 1
Register Schema
id: 1
Schema id
Kafka
client
Kafka
client
Kafka
client
Deserializer
Deserializer
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Deserializer
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
KafkaAvro
Deserializer
id: 1 id: 1 id: 1
Register Schema
id: 1
id: 1 id: 1 id: 1
Schema id
Schema id
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
Kafka
client
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Deserializer
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Kafka
client
Confluent Schema
Registry
KafkaAvro
Deserializer
id: 1 id: 1 id: 1
Register Schema
id: 1
id: 1 id: 1 id: 1
Schema id
Schema id
Load Schema
Kafka
client
Confluent Schema Registry = runtime dependency
Need high availability
Avro
§ At ING we prefer Avro
§ Apache Avro™ is a data serialization system
offering rich data structures and uses a compact
and binary format.
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": ”isJointAccountHolder", "type": ”boolean "}
]
}
{
"name": ”Jack",
"isJointAccountHolder": true
}
Avro field types
q primitive types (null, boolean, int, long, float, double, bytes, and string)
q complex types (record, enum, array, map, union, and fixed).
q Logical types(decimal, uuid, date…)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": ”isJointAccountHolder", "type": ”boolean "},
{ "name": "country", "type": { "name": "Country", "type": "enum", "symbols" : ["US", "UK", "NL"] } },
{ "name": ”dateJoined", "type": ”long”, ”logicalType": ” timestamp-millis” }
]
}
{
"name": ”Jack",
"isJointAccountHolder": true,
”country": ”UK”,
”dateJoined": 1708944593285
}
Maps
Note: the values type applies for the values in the map. The keys are strings.
Example java Map representation:
Map<String, Long> customerPropertiesMap = new HashMap<>();
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": "name", "type": "string" },
{ "name": "isJointAccountHolder", "type": "boolean "},
{ "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}},
{ "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"},
{ "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }}
]
}
{
"name": ”Jack",
"isJointAccountHolder": true,
”country": ”UK”,
”dateJoined": 1708944593285,
“customerPropertiesMap”:
{”key1": 1708, ”key2": 1709}
}
Fixed
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": "name", "type": "string" },
{ "name": "isJointAccountHolder", "type": "boolean "},
{ "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}},
{ "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"},
{ "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }, "doc": "Customer
properties"},
{ "name": "annualIncome", "type": ["null", {"name": "AnnualIncome", "type": "fixed", "size":
32}],"doc": "Annual income of the Customer.", "default": null}
]
}
{
"name": ”Jack",
"isJointAccountHolder": true,
”country": ”UK”,
”dateJoined": 1708944593285,
“customerPropertiesMap”:
{”key1": 1708, ”key2": 1709},
”annualIncome": [64, -9, 92, …]
}
Unions
• Unions are represented using JSON arrays
• For example, ["null", "string"] declares a schema which may be either a
null or string.
• Question: Who thinks this a valid definition?
{
….
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"doc": "The first name of the Customer."
}
,…]
{
….
"fields": [
{
"name": "firstName",
"type": ["null", "string”, “int”],
"doc": "The first name of the Customer."
}
,…]
org.apache.kafka.common.errors.SerializationException: Error
serializing Avro message…
…
Caused by: org.apache.avro.UnresolvedUnionException: Not in union
["null","string","int"]: true
Aliases
{
….
"fields": [
{
"name": ”customerName",
"aliases": [
”name"
],
"type": "string”,
"doc": "The name of the Customer.",
"default": null
}
,…]
{
….
"fields": [
{
"name": ”name",
"type": "string”,
"doc": "The name of the Customer.",
"default": null
}
,…]
• Named types and fields may have aliases
• Aliases function by re-writing the writer's schema using aliases from the reader's schema.
Consumer
Producer
Compatibility Modes
BACKWARD
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "1",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": "occupation", "type": "string "}
]
}
Producer 1: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "1",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": "occupation", "type": "string "}
]
}
Consumer 1 read: V1
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer 2 read: V2 (Delete field)
BACKWARD
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Producer 1: V2
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer 1 read: V2
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer 2 read: V2 (Delete field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
Consumer 3 read: V3 (Add optional field)
BACKWARD TRANSITIVE
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "1",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": "occupation", "type": "string "}
]
}
Producer: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer read: V2 (Delete field)
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": “null”}
]
}
Consumer read: V3 (Add optional field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”…n",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
]
}
Consumer read: V…n (Delete field)
Compatible
FORWARD
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null"," double"],"default": null}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null",”double"],"default": null}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”2",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null",”double"],"default": null}
]
}
Producer write: V2 (Delete Optional Field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null",”double"],"default": null},
{"name": "dateOfBirth", "type": "string", "doc": "The date of birth for the Customer."}
]
}
Producer write: V3 (Add Required Field)
Consumer read: V1
Producer write: V1
FORWARD TRANSITIVE
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
]
}
Producer: V2 (Delete Optional Field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{"name": "dateOfBirth", "type": "string”}
]
}
Producer: V3 (Add Field)
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
]
}
Consumer read: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”…n",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{"name": "dateOfBirth", "type": "string"},
{"name": "phoneNumber", "type": "string”}
]
}
Producer: V..n (Add Field)
Compatible
FULL
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name , "type": [”null",”string"],"default": null}
{ "name": "occupation , "type": [”null",”string"],"default": null}
{ "name": "annualIncome", "type": ["null","int"],"default": null},
{ "name": "dateOfBirth", "type": [”null",”string"],"default": null}
}
Producer: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": "name , "type": [”null",”string"],"default": null}
{ "name": "occupation , "type": [”null",”string"],"default": null}
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{ "name": "dateOfBirth", "type": [”null",”string"],"default": null},
{ "name": "phoneNumber" , "type": [”null",”string"],"default": null}
}
Consumer read: V2
Consumer
Producer
NOTE:
• The default values apply only on the consumer side.
• On the producer side you need to set a value for the field
Available compatibility types
From: Confluent Schema Registry documentation
New schema can
be used to read
old data
Old schema can
be used to read
new data
Both backward
and forward
No compatibility
enforced
What compatibility to use ?
• If you are the topic owner and the producer and in control of evolving the
schema and you don’t want you break existing consumers, use
FORWARD
• If you are the topic owner and a consumer, use BACKWARD, so you can
upgrade first and then ask the producer to evolve its schema with the
fields you need
Demo
Backward Compatibility Demo: Components
kafka-producer-one
(schema v1 )
kafka-consumer-third
(schema v4)
poll
send
kafka-consumer-
first (schema v2)
poll
kafka-consumer-
second (schema v3)
occupation (required)
annualIncome (optional)
age (required)
Kafka cluster
old
0 1 2 3 4 5 6
new
customers-topic-
backward
poll
Adding a required field is not a
backward compatible change!
Forward Compatibility Demo: Components
kafka-producer-one
(schema v2) send
kafka-consumer-
first (schema v1)
poll
kafka-producer-two
(schema v3) send
kafka-producer-
three (schema v4) send
annualIncome (optional)
dateOfBirth (required)
phoneNumber (required)
Kafka cluster
old
0 1 2 3 4 5 6
new
customers-topic-
forward
Removing a required field is not a
forward compatible change!
Plugins: Avro Schema to Java Class
Avro
Schema
.avsc
avro-maven-plugin
.class .jar
maven-compiler-plugin
.java
maven-jar-plugin
Plugins: Avro Schema to Java Class
Avro
Schema
.avsc
avro-maven-plugin
.class .jar
maven-compiler-plugin
.java
maven-jar-plugin
• Validation of Avro Syntax
• No validation on compatibility!
package com.example.avro.customer;
/** Avro schema for our customer. */
@org.apache.avro.specific.AvroGenerated
public class Customer extends
org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = 1600536469030327220L;
public static final org.apache.avro.Schema SCHEMA$ = new
org.apache.avro.Schema.Parser().parse("{"type":"record","name":"
CustomerBackwardDemo","namespace":"com.example.avro.custom
er","doc":"Avro schema for our
customer.","fields":[{"name":"name","type":{"type":"string","
avro.java.string":"String"},"doc":"The name of the
Customer."},{"name":"occupation","type":{"type":"string","avro
.java.string":"String"},"doc":"The occupation of the
Customer."}],"version":1}");
…
}
{
"namespace": "com.example.avro.customer",
"type": "record",
"name": "Customer",
"version": 1,
"doc": "Avro schema for our customer.",
"fields": [
{
"name": "name",
"type": "string",
"doc": "The name of the Customer."
},
{
"name": "occupation",
"type": "string",
"doc": "The occupation of the Customer."
}
]
}
Plugins: Avro Schema to Java Class
Customer.avsc Customer.java
Plugins: Avro Schema to Java Class
.jar
Producer
Kafka
client Consumer
Kafka
client
Kafka Streams
App
Specific record:
Customer.class
Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• Curl
• Confluent CLI
• Confluent Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• Curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Automate in
Your Maven
build
Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Automate in
Your Maven
build
Test Avro compatibility: Unit tests
Unit test style
V1 V2 V3
Validate compatibility
• curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
• Unit tests
Your
Java
project
Automate in
Your Maven
build
Validate compatibility
Demo
Should we auto register schemas ?
• By default, client applications automatically register new schemas
• Auto registration is performed by the producers only
• For development environments you can use auto register schema
• For Prod environments the best practice is
• to register schemas outside the client application
• to control when schemas are registered with Schema Registry and how they evolve
• You can disable auto schema registration on the producer auto.register.schemas: false
• Schema Registry: Schema registry security plugin
package com.example.avro.customer;
/** Avro schema for our customer. */
@org.apache.avro.specific.AvroGenerated
public class Customer extends org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = 1600536469030327220L;
public static final org.apache.avro.Schema SCHEMA$ = new
org.apache.avro.Schema.Parser().parse("{"type":"record","name":"CustomerBackwardDemo
","namespace":"com.example.avro.customer","doc":"Avro schema for our
customer.","fields":[{"name":"name","type":{"type":"string","avro.java.string":"String"
},"doc":"The name of the
Customer."},{"name":"occupation","type":{"type":"string","avro.java.string":"String"},
"doc":"The occupation of the Customer."}],"version":1}");
…
}
Auto register schema lessons learned
• Maven Avro plugin: additional information appended to the schema in Java code
• Producer (KafkaAvroSerializer): auto.register.schemas: false
• When serializing the Avro Schema is derived from Customer Java object
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
Mismatch in schema comparison
Avro Schema (avsc) registered
in Schema Registry
Avro Schema (Java) in producer
Auto register schema lessons learned
• If you are using the its recommented to set this property (KafkaAvroSerializer)
avro.remove.java.properties: true
Note:
There is an open issue for the Avro Maven Plugin for this AVRO-2838
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
No mismatch in schema comparison
Avro Schema (avsc) Avro Schema (as Java String)
Schema Evolution Guidelines
Rules of the Road for Modifying Schemas
If you want to make your schema evolvable, then follow these guidelines.
§ Provide a default value for fields in your schema, as this allows you to delete the field later.
§ Don’t change a field’s data type.
§ Don’t rename an existing field (use aliases instead).
Breaking changes. How to move forward?
What can you do?
• “Force push“ schema
• BACKWARD -> NONE -> BACKWARD
• Allow for downtime?
• Both producers and consumer under your control?
• Last resort
• “Produce to multiple topics”
• V1 topic
• V2 topic
• Migrate consumers
• Transaction atomic operation
• Data Contracts for Schema Registry
• Field level transformations
Wrap up
Communication
§ Important to communicate changes between producing and consuming
teams
Gain more confidence
§ Add unit/integration tests to make sure your changes are compatible
Wrap up
Schema registration
§ Don’t allow applications register schemas automatically
§ Don’t assume application will set auto.register.schemas=false
§ Make sure to have security measurements in place
Be aware of pitfalls
§ Avro Maven plugin adds: "avro.java.string"
§ Deserialization exceptions on the consumer side
Questions?
🤔
❔
Demo codebase:
https://github.com/j-tim/kafka-summit-london-2024
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibility and Schema Registry

More Related Content

Similar to Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibility and Schema Registry

Redesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCONRedesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCONDaniel Jacobson
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with StreamsBen Stopford
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
What I did in My Internship @ WSO2
What I did in My Internship @ WSO2What I did in My Internship @ WSO2
What I did in My Internship @ WSO2Andun Sameera
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIAnimesh Singh
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafkaconfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021amesar0
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLconfluent
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...HostedbyConfluent
 

Similar to Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibility and Schema Registry (20)

Redesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCONRedesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCON
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
What I did in My Internship @ WSO2
What I did in My Internship @ WSO2What I did in My Internship @ WSO2
What I did in My Internship @ WSO2
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQL
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibility and Schema Registry

  • 1. Evolve your schemas in a better way! A deep dive into Avro schema compatibility and Schema Registry Tim van Baarsen & Kosta Chuturkov
  • 2. About the Speakers The Netherlands - Amsterdam Team Dora Romania - Bucharest
  • 3. ING www.ing.jobs • 60,000+ employees • Serve 37+ million customers • Corporate clients and financial institutions in over 40 countries
  • 4. Kafka @ ING Frontrunners in Kafka since 2014 Running in production: • 9 years • 7000+ topics • Serving 1000+ Development teams • Self service topic management
  • 5. Kafka @ ING Traffic is growing with +10% monthly 0 200.000 400.000 600.000 800.000 1.000.000 1.200.000 1.400.000 1.600.000 1.800.000 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 Messages produced per second (average) Messages produced per second (average)
  • 6. What are we going to cover today ? • Why Schemas? • What compatibility level to pick? • What changes can I make when evolving my schemas? • What options do I have when I need to introduce a breaking change? • Should we automatically register schemas from our applications? • How do you generate Java classes from your Avro schemas and you build an automated test suite (unit tests)
  • 7. Why schemas? The only constant in life is change! -Heraclitus (Greek philosopher)
  • 8. Why schemas? The only constant in life is change! The same applies to your Kafka events flowing through your streaming applications.
  • 9. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Responsibilities: - subscribe - deserialization • key • value - heartbeat Responsibilities: - send - serialization • key • value Not responsible for: • Type checking • Schema validation * • Other constraints Data in a Kafka topic are just stored as bytes! Deserializer Serializer = flow of data 0100101001101 your-topic Kafka client Kafka client
  • 10. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumers and producers are decoupled at runtime Kafka client Kafka client
  • 11. Why schemas? Consumers and producers are decoupled at runtime Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Kafka client Kafka client Kafka client
  • 12. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Indirectly coupled on the data format Kafka client Kafka client Kafka client
  • 13. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Indirectly coupled on the data format What fields and types of data can I expect? Documentation of the fields? Kafka client Kafka client Kafka client
  • 14. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Indirectly coupled on the data format Some requirements changed. We need to introduce a new field Don’t cause inconsistency Keep It compatible! No disruption my service Kafka client Kafka client Kafka client
  • 15. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema Kafka client Kafka client Kafka client
  • 16. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema Kafka client Kafka client Kafka client
  • 17. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema We need the schema the data was written with to be able to read it Kafka client Kafka client Kafka client
  • 18. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema Don’t send the schema each time we send data Kafka client Kafka client Kafka client
  • 19. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Confluent Schema Registry Kafka client Kafka client Kafka client Don’t send the schema each time we send data
  • 20. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry Kafka client Kafka client Kafka client Serializer Deserializer Deserializer
  • 21. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry Register Schema Kafka client Kafka client Kafka client Deserializer Deserializer
  • 22. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry id: 1 id: 1 id: 1 Register Schema id: 1 Schema id Kafka client Kafka client Kafka client Deserializer Deserializer
  • 23. Why schemas? Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Deserializer KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry KafkaAvro Deserializer id: 1 id: 1 id: 1 Register Schema id: 1 id: 1 id: 1 id: 1 Schema id Schema id Kafka client Kafka client Kafka client
  • 24. Why schemas? Producer Application Kafka cluster Consumer Application Kafka client 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Deserializer KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Kafka client Confluent Schema Registry KafkaAvro Deserializer id: 1 id: 1 id: 1 Register Schema id: 1 id: 1 id: 1 id: 1 Schema id Schema id Load Schema Kafka client Confluent Schema Registry = runtime dependency Need high availability
  • 25. Avro § At ING we prefer Avro § Apache Avro™ is a data serialization system offering rich data structures and uses a compact and binary format. { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string” }, { "name": ”isJointAccountHolder", "type": ”boolean "} ] } { "name": ”Jack", "isJointAccountHolder": true }
  • 26. Avro field types q primitive types (null, boolean, int, long, float, double, bytes, and string) q complex types (record, enum, array, map, union, and fixed). q Logical types(decimal, uuid, date…) { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string” }, { "name": ”isJointAccountHolder", "type": ”boolean "}, { "name": "country", "type": { "name": "Country", "type": "enum", "symbols" : ["US", "UK", "NL"] } }, { "name": ”dateJoined", "type": ”long”, ”logicalType": ” timestamp-millis” } ] } { "name": ”Jack", "isJointAccountHolder": true, ”country": ”UK”, ”dateJoined": 1708944593285 }
  • 27. Maps Note: the values type applies for the values in the map. The keys are strings. Example java Map representation: Map<String, Long> customerPropertiesMap = new HashMap<>(); { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": "name", "type": "string" }, { "name": "isJointAccountHolder", "type": "boolean "}, { "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}}, { "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"}, { "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }} ] } { "name": ”Jack", "isJointAccountHolder": true, ”country": ”UK”, ”dateJoined": 1708944593285, “customerPropertiesMap”: {”key1": 1708, ”key2": 1709} }
  • 28. Fixed { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": "name", "type": "string" }, { "name": "isJointAccountHolder", "type": "boolean "}, { "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}}, { "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"}, { "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }, "doc": "Customer properties"}, { "name": "annualIncome", "type": ["null", {"name": "AnnualIncome", "type": "fixed", "size": 32}],"doc": "Annual income of the Customer.", "default": null} ] } { "name": ”Jack", "isJointAccountHolder": true, ”country": ”UK”, ”dateJoined": 1708944593285, “customerPropertiesMap”: {”key1": 1708, ”key2": 1709}, ”annualIncome": [64, -9, 92, …] }
  • 29. Unions • Unions are represented using JSON arrays • For example, ["null", "string"] declares a schema which may be either a null or string. • Question: Who thinks this a valid definition? { …. "fields": [ { "name": "firstName", "type": ["null", "string"], "doc": "The first name of the Customer." } ,…] { …. "fields": [ { "name": "firstName", "type": ["null", "string”, “int”], "doc": "The first name of the Customer." } ,…] org.apache.kafka.common.errors.SerializationException: Error serializing Avro message… … Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","string","int"]: true
  • 30. Aliases { …. "fields": [ { "name": ”customerName", "aliases": [ ”name" ], "type": "string”, "doc": "The name of the Customer.", "default": null } ,…] { …. "fields": [ { "name": ”name", "type": "string”, "doc": "The name of the Customer.", "default": null } ,…] • Named types and fields may have aliases • Aliases function by re-writing the writer's schema using aliases from the reader's schema. Consumer Producer
  • 32. BACKWARD { "type": "record", "namespace": "com.example", "name": "Customer", "version": "1", "fields": [ { "name": ”name", "type": "string” }, { "name": "occupation", "type": "string "} ] } Producer 1: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "1", "fields": [ { "name": ”name", "type": "string” }, { "name": "occupation", "type": "string "} ] } Consumer 1 read: V1 Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer 2 read: V2 (Delete field)
  • 33. BACKWARD { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Producer 1: V2 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer 1 read: V2 Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer 2 read: V2 (Delete field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} Consumer 3 read: V3 (Add optional field)
  • 34. BACKWARD TRANSITIVE { "type": "record", "namespace": "com.example", "name": "Customer", "version": "1", "fields": [ { "name": ”name", "type": "string” }, { "name": "occupation", "type": "string "} ] } Producer: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer read: V2 (Delete field) Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": “null”} ] } Consumer read: V3 (Add optional field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”…n", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} ] } Consumer read: V…n (Delete field) Compatible
  • 35. FORWARD Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”1", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null"," double"],"default": null} ] } { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”1", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null",”double"],"default": null} ] } { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”2", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null",”double"],"default": null} ] } Producer write: V2 (Delete Optional Field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null",”double"],"default": null}, {"name": "dateOfBirth", "type": "string", "doc": "The date of birth for the Customer."} ] } Producer write: V3 (Add Required Field) Consumer read: V1 Producer write: V1
  • 36. FORWARD TRANSITIVE { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} ] } Producer: V2 (Delete Optional Field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null}, {"name": "dateOfBirth", "type": "string”} ] } Producer: V3 (Add Field) Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} ] } Consumer read: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”…n", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null}, {"name": "dateOfBirth", "type": "string"}, {"name": "phoneNumber", "type": "string”} ] } Producer: V..n (Add Field) Compatible
  • 37. FULL { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”1", "fields": [ { "name": "name , "type": [”null",”string"],"default": null} { "name": "occupation , "type": [”null",”string"],"default": null} { "name": "annualIncome", "type": ["null","int"],"default": null}, { "name": "dateOfBirth", "type": [”null",”string"],"default": null} } Producer: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": "name , "type": [”null",”string"],"default": null} { "name": "occupation , "type": [”null",”string"],"default": null} { "name": " annualIncome", "type": ["null","int"],"default": null}, { "name": "dateOfBirth", "type": [”null",”string"],"default": null}, { "name": "phoneNumber" , "type": [”null",”string"],"default": null} } Consumer read: V2 Consumer Producer NOTE: • The default values apply only on the consumer side. • On the producer side you need to set a value for the field
  • 38. Available compatibility types From: Confluent Schema Registry documentation New schema can be used to read old data Old schema can be used to read new data Both backward and forward No compatibility enforced
  • 39. What compatibility to use ? • If you are the topic owner and the producer and in control of evolving the schema and you don’t want you break existing consumers, use FORWARD • If you are the topic owner and a consumer, use BACKWARD, so you can upgrade first and then ask the producer to evolve its schema with the fields you need
  • 40. Demo
  • 41. Backward Compatibility Demo: Components kafka-producer-one (schema v1 ) kafka-consumer-third (schema v4) poll send kafka-consumer- first (schema v2) poll kafka-consumer- second (schema v3) occupation (required) annualIncome (optional) age (required) Kafka cluster old 0 1 2 3 4 5 6 new customers-topic- backward poll Adding a required field is not a backward compatible change!
  • 42. Forward Compatibility Demo: Components kafka-producer-one (schema v2) send kafka-consumer- first (schema v1) poll kafka-producer-two (schema v3) send kafka-producer- three (schema v4) send annualIncome (optional) dateOfBirth (required) phoneNumber (required) Kafka cluster old 0 1 2 3 4 5 6 new customers-topic- forward Removing a required field is not a forward compatible change!
  • 43. Plugins: Avro Schema to Java Class Avro Schema .avsc avro-maven-plugin .class .jar maven-compiler-plugin .java maven-jar-plugin
  • 44. Plugins: Avro Schema to Java Class Avro Schema .avsc avro-maven-plugin .class .jar maven-compiler-plugin .java maven-jar-plugin • Validation of Avro Syntax • No validation on compatibility!
  • 45. package com.example.avro.customer; /** Avro schema for our customer. */ @org.apache.avro.specific.AvroGenerated public class Customer extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { private static final long serialVersionUID = 1600536469030327220L; public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{"type":"record","name":" CustomerBackwardDemo","namespace":"com.example.avro.custom er","doc":"Avro schema for our customer.","fields":[{"name":"name","type":{"type":"string"," avro.java.string":"String"},"doc":"The name of the Customer."},{"name":"occupation","type":{"type":"string","avro .java.string":"String"},"doc":"The occupation of the Customer."}],"version":1}"); … } { "namespace": "com.example.avro.customer", "type": "record", "name": "Customer", "version": 1, "doc": "Avro schema for our customer.", "fields": [ { "name": "name", "type": "string", "doc": "The name of the Customer." }, { "name": "occupation", "type": "string", "doc": "The occupation of the Customer." } ] } Plugins: Avro Schema to Java Class Customer.avsc Customer.java
  • 46. Plugins: Avro Schema to Java Class .jar Producer Kafka client Consumer Kafka client Kafka Streams App Specific record: Customer.class
  • 47. Test Avro compatibility Integration test style Confluent Schema Registry Subject: customer-value Compatibility: BACKWARD REST API V1 V2 V1 V2 V3 Validate compatibility • Curl • Confluent CLI • Confluent Maven Plugin Your Java project Registered in the Schema Registry
  • 48. Test Avro compatibility Integration test style Confluent Schema Registry Subject: customer-value Compatibility: BACKWARD REST API V1 V2 V1 V2 V3 Validate compatibility • Curl • Confluent CLI • Confluent Schema Registry Maven Plugin Your Java project Registered in the Schema Registry Automate in Your Maven build
  • 49. Test Avro compatibility Integration test style Confluent Schema Registry Subject: customer-value Compatibility: BACKWARD REST API V1 V2 V1 V2 V3 Validate compatibility • curl • Confluent CLI • Confluent Schema Registry Maven Plugin Your Java project Registered in the Schema Registry Automate in Your Maven build
  • 50. Test Avro compatibility: Unit tests Unit test style V1 V2 V3 Validate compatibility • curl • Confluent CLI • Confluent Schema Registry Maven Plugin • Unit tests Your Java project Automate in Your Maven build Validate compatibility
  • 51. Demo
  • 52. Should we auto register schemas ? • By default, client applications automatically register new schemas • Auto registration is performed by the producers only • For development environments you can use auto register schema • For Prod environments the best practice is • to register schemas outside the client application • to control when schemas are registered with Schema Registry and how they evolve • You can disable auto schema registration on the producer auto.register.schemas: false • Schema Registry: Schema registry security plugin
  • 53. package com.example.avro.customer; /** Avro schema for our customer. */ @org.apache.avro.specific.AvroGenerated public class Customer extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { private static final long serialVersionUID = 1600536469030327220L; public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{"type":"record","name":"CustomerBackwardDemo ","namespace":"com.example.avro.customer","doc":"Avro schema for our customer.","fields":[{"name":"name","type":{"type":"string","avro.java.string":"String" },"doc":"The name of the Customer."},{"name":"occupation","type":{"type":"string","avro.java.string":"String"}, "doc":"The occupation of the Customer."}],"version":1}"); … } Auto register schema lessons learned • Maven Avro plugin: additional information appended to the schema in Java code • Producer (KafkaAvroSerializer): auto.register.schemas: false • When serializing the Avro Schema is derived from Customer Java object { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string”, ”doc": "The name of the Customer.” } ] } Mismatch in schema comparison Avro Schema (avsc) registered in Schema Registry Avro Schema (Java) in producer
  • 54. Auto register schema lessons learned • If you are using the its recommented to set this property (KafkaAvroSerializer) avro.remove.java.properties: true Note: There is an open issue for the Avro Maven Plugin for this AVRO-2838 { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string”, ”doc": "The name of the Customer.” } ] } { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string”, ”doc": "The name of the Customer.” } ] } No mismatch in schema comparison Avro Schema (avsc) Avro Schema (as Java String)
  • 55. Schema Evolution Guidelines Rules of the Road for Modifying Schemas If you want to make your schema evolvable, then follow these guidelines. § Provide a default value for fields in your schema, as this allows you to delete the field later. § Don’t change a field’s data type. § Don’t rename an existing field (use aliases instead).
  • 56. Breaking changes. How to move forward? What can you do? • “Force push“ schema • BACKWARD -> NONE -> BACKWARD • Allow for downtime? • Both producers and consumer under your control? • Last resort • “Produce to multiple topics” • V1 topic • V2 topic • Migrate consumers • Transaction atomic operation • Data Contracts for Schema Registry • Field level transformations
  • 57. Wrap up Communication § Important to communicate changes between producing and consuming teams Gain more confidence § Add unit/integration tests to make sure your changes are compatible
  • 58. Wrap up Schema registration § Don’t allow applications register schemas automatically § Don’t assume application will set auto.register.schemas=false § Make sure to have security measurements in place Be aware of pitfalls § Avro Maven plugin adds: "avro.java.string" § Deserialization exceptions on the consumer side