Evolve your schemas in a better way!
A deep dive into Avro schema compatibility
and Schema Registry
Tim van Baarsen & Kosta Chuturkov
About the Speakers
The Netherlands
-
Amsterdam
Team Dora Romania
-
Bucharest
ING
www.ing.jobs
• 60,000+ employees
• Serve 37+ million customers
• Corporate clients and financial
institutions in over 40
countries
Kafka @ ING
Frontrunners in Kafka since 2014
Running in production:
• 9 years
• 7000+ topics
• Serving 1000+ Development teams
• Self service topic management
Kafka @ ING
Traffic is growing with +10% monthly
0
200.000
400.000
600.000
800.000
1.000.000
1.200.000
1.400.000
1.600.000
1.800.000
2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Messages produced per second (average)
Messages produced per second (average)
What are we going to cover today ?
• Why Schemas?
• What compatibility level to pick?
• What changes can I make when evolving my schemas?
• What options do I have when I need to introduce a breaking change?
• Should we automatically register schemas from our applications?
• How do you generate Java classes from your Avro schemas and you build an
automated test suite (unit tests)
Why schemas?
The only constant in life is change!
-Heraclitus (Greek philosopher)
Why schemas?
The only constant in life is change!
The same applies to your Kafka
events flowing through your
streaming applications.
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Responsibilities:
- subscribe
- deserialization
• key
• value
- heartbeat
Responsibilities:
- send
- serialization
• key
• value
Not responsible for:
• Type checking
• Schema validation *
• Other constraints
Data in a Kafka topic are just stored as bytes!
Deserializer
Serializer
= flow of data
0100101001101
your-topic Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumers and producers are decoupled
at runtime
Kafka
client
Kafka
client
Why schemas?
Consumers and producers are decoupled
at runtime
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
What fields and
types of data can I
expect?
Documentation of
the fields?
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
Some requirements
changed. We need to
introduce a new field
Don’t cause
inconsistency
Keep It compatible!
No disruption my
service
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
We need the schema
the data was written
with to be able to read it
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Don’t send the schema
each time we send data
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Confluent Schema
Registry
Kafka
client
Kafka
client
Kafka
client
Don’t send the schema
each time we send data
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
Kafka
client
Kafka
client
Kafka
client
Serializer Deserializer
Deserializer
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
Register Schema
Kafka
client
Kafka
client
Kafka
client
Deserializer
Deserializer
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
id: 1 id: 1 id: 1
Register Schema
id: 1
Schema id
Kafka
client
Kafka
client
Kafka
client
Deserializer
Deserializer
Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Deserializer
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Confluent Schema
Registry
KafkaAvro
Deserializer
id: 1 id: 1 id: 1
Register Schema
id: 1
id: 1 id: 1 id: 1
Schema id
Schema id
Kafka
client
Kafka
client
Kafka
client
Why schemas?
Producer Application
Kafka cluster
Consumer Application
Kafka
client
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
KafkaAvro
Deserializer
KafkaAvro
Serializer
your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Kafka
client
Confluent Schema
Registry
KafkaAvro
Deserializer
id: 1 id: 1 id: 1
Register Schema
id: 1
id: 1 id: 1 id: 1
Schema id
Schema id
Load Schema
Kafka
client
Confluent Schema Registry = runtime dependency
Need high availability
Avro
§ At ING we prefer Avro
§ Apache Avro™ is a data serialization system
offering rich data structures and uses a compact
and binary format.
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": ”isJointAccountHolder", "type": ”boolean "}
]
}
{
"name": ”Jack",
"isJointAccountHolder": true
}
Avro field types
q primitive types (null, boolean, int, long, float, double, bytes, and string)
q complex types (record, enum, array, map, union, and fixed).
q Logical types(decimal, uuid, date…)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": ”isJointAccountHolder", "type": ”boolean "},
{ "name": "country", "type": { "name": "Country", "type": "enum", "symbols" : ["US", "UK", "NL"] } },
{ "name": ”dateJoined", "type": ”long”, ”logicalType": ” timestamp-millis” }
]
}
{
"name": ”Jack",
"isJointAccountHolder": true,
”country": ”UK”,
”dateJoined": 1708944593285
}
Maps
Note: the values type applies for the values in the map. The keys are strings.
Example java Map representation:
Map<String, Long> customerPropertiesMap = new HashMap<>();
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": "name", "type": "string" },
{ "name": "isJointAccountHolder", "type": "boolean "},
{ "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}},
{ "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"},
{ "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }}
]
}
{
"name": ”Jack",
"isJointAccountHolder": true,
”country": ”UK”,
”dateJoined": 1708944593285,
“customerPropertiesMap”:
{”key1": 1708, ”key2": 1709}
}
Fixed
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": "name", "type": "string" },
{ "name": "isJointAccountHolder", "type": "boolean "},
{ "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}},
{ "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"},
{ "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }, "doc": "Customer
properties"},
{ "name": "annualIncome", "type": ["null", {"name": "AnnualIncome", "type": "fixed", "size":
32}],"doc": "Annual income of the Customer.", "default": null}
]
}
{
"name": ”Jack",
"isJointAccountHolder": true,
”country": ”UK”,
”dateJoined": 1708944593285,
“customerPropertiesMap”:
{”key1": 1708, ”key2": 1709},
”annualIncome": [64, -9, 92, …]
}
Unions
• Unions are represented using JSON arrays
• For example, ["null", "string"] declares a schema which may be either a
null or string.
• Question: Who thinks this a valid definition?
{
….
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"doc": "The first name of the Customer."
}
,…]
{
….
"fields": [
{
"name": "firstName",
"type": ["null", "string”, “int”],
"doc": "The first name of the Customer."
}
,…]
org.apache.kafka.common.errors.SerializationException: Error
serializing Avro message…
…
Caused by: org.apache.avro.UnresolvedUnionException: Not in union
["null","string","int"]: true
Aliases
{
….
"fields": [
{
"name": ”customerName",
"aliases": [
”name"
],
"type": "string”,
"doc": "The name of the Customer.",
"default": null
}
,…]
{
….
"fields": [
{
"name": ”name",
"type": "string”,
"doc": "The name of the Customer.",
"default": null
}
,…]
• Named types and fields may have aliases
• Aliases function by re-writing the writer's schema using aliases from the reader's schema.
Consumer
Producer
Compatibility Modes
BACKWARD
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "1",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": "occupation", "type": "string "}
]
}
Producer 1: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "1",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": "occupation", "type": "string "}
]
}
Consumer 1 read: V1
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer 2 read: V2 (Delete field)
BACKWARD
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Producer 1: V2
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer 1 read: V2
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer 2 read: V2 (Delete field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
Consumer 3 read: V3 (Add optional field)
BACKWARD TRANSITIVE
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "1",
"fields": [
{ "name": ”name", "type": "string” },
{ "name": "occupation", "type": "string "}
]
}
Producer: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": " name ", "type": "string "},
{ "name": "occupation", "type": "string "}
}
Consumer read: V2 (Delete field)
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": “null”}
]
}
Consumer read: V3 (Add optional field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”…n",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
]
}
Consumer read: V…n (Delete field)
Compatible
FORWARD
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null"," double"],"default": null}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null",”double"],"default": null}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”2",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null",”double"],"default": null}
]
}
Producer write: V2 (Delete Optional Field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": " annualIncome", "type": ["null",”double"],"default": null},
{"name": "dateOfBirth", "type": "string", "doc": "The date of birth for the Customer."}
]
}
Producer write: V3 (Add Required Field)
Consumer read: V1
Producer write: V1
FORWARD TRANSITIVE
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
]
}
Producer: V2 (Delete Optional Field)
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{"name": "dateOfBirth", "type": "string”}
]
}
Producer: V3 (Add Field)
Consumer
Producer
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”3",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null}
]
}
Consumer read: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”…n",
"fields": [
{ "name": "name ", "type": "string ”},
{ "name": "occupation", "type": "string"},
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{"name": "dateOfBirth", "type": "string"},
{"name": "phoneNumber", "type": "string”}
]
}
Producer: V..n (Add Field)
Compatible
FULL
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name , "type": [”null",”string"],"default": null}
{ "name": "occupation , "type": [”null",”string"],"default": null}
{ "name": "annualIncome", "type": ["null","int"],"default": null},
{ "name": "dateOfBirth", "type": [”null",”string"],"default": null}
}
Producer: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": "name , "type": [”null",”string"],"default": null}
{ "name": "occupation , "type": [”null",”string"],"default": null}
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{ "name": "dateOfBirth", "type": [”null",”string"],"default": null},
{ "name": "phoneNumber" , "type": [”null",”string"],"default": null}
}
Consumer read: V2
Consumer
Producer
NOTE:
• The default values apply only on the consumer side.
• On the producer side you need to set a value for the field
Available compatibility types
From: Confluent Schema Registry documentation
New schema can
be used to read
old data
Old schema can
be used to read
new data
Both backward
and forward
No compatibility
enforced
What compatibility to use ?
• If you are the topic owner and the producer and in control of evolving the
schema and you don’t want you break existing consumers, use
FORWARD
• If you are the topic owner and a consumer, use BACKWARD, so you can
upgrade first and then ask the producer to evolve its schema with the
fields you need
Demo
Backward Compatibility Demo: Components
kafka-producer-one
(schema v1 )
kafka-consumer-third
(schema v4)
poll
send
kafka-consumer-
first (schema v2)
poll
kafka-consumer-
second (schema v3)
occupation (required)
annualIncome (optional)
age (required)
Kafka cluster
old
0 1 2 3 4 5 6
new
customers-topic-
backward
poll
Adding a required field is not a
backward compatible change!
Forward Compatibility Demo: Components
kafka-producer-one
(schema v2) send
kafka-consumer-
first (schema v1)
poll
kafka-producer-two
(schema v3) send
kafka-producer-
three (schema v4) send
annualIncome (optional)
dateOfBirth (required)
phoneNumber (required)
Kafka cluster
old
0 1 2 3 4 5 6
new
customers-topic-
forward
Removing a required field is not a
forward compatible change!
Plugins: Avro Schema to Java Class
Avro
Schema
.avsc
avro-maven-plugin
.class .jar
maven-compiler-plugin
.java
maven-jar-plugin
Plugins: Avro Schema to Java Class
Avro
Schema
.avsc
avro-maven-plugin
.class .jar
maven-compiler-plugin
.java
maven-jar-plugin
• Validation of Avro Syntax
• No validation on compatibility!
package com.example.avro.customer;
/** Avro schema for our customer. */
@org.apache.avro.specific.AvroGenerated
public class Customer extends
org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = 1600536469030327220L;
public static final org.apache.avro.Schema SCHEMA$ = new
org.apache.avro.Schema.Parser().parse("{"type":"record","name":"
CustomerBackwardDemo","namespace":"com.example.avro.custom
er","doc":"Avro schema for our
customer.","fields":[{"name":"name","type":{"type":"string","
avro.java.string":"String"},"doc":"The name of the
Customer."},{"name":"occupation","type":{"type":"string","avro
.java.string":"String"},"doc":"The occupation of the
Customer."}],"version":1}");
…
}
{
"namespace": "com.example.avro.customer",
"type": "record",
"name": "Customer",
"version": 1,
"doc": "Avro schema for our customer.",
"fields": [
{
"name": "name",
"type": "string",
"doc": "The name of the Customer."
},
{
"name": "occupation",
"type": "string",
"doc": "The occupation of the Customer."
}
]
}
Plugins: Avro Schema to Java Class
Customer.avsc Customer.java
Plugins: Avro Schema to Java Class
.jar
Producer
Kafka
client Consumer
Kafka
client
Kafka Streams
App
Specific record:
Customer.class
Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• Curl
• Confluent CLI
• Confluent Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• Curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Automate in
Your Maven
build
Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Automate in
Your Maven
build
Test Avro compatibility: Unit tests
Unit test style
V1 V2 V3
Validate compatibility
• curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
• Unit tests
Your
Java
project
Automate in
Your Maven
build
Validate compatibility
Demo
Should we auto register schemas ?
• By default, client applications automatically register new schemas
• Auto registration is performed by the producers only
• For development environments you can use auto register schema
• For Prod environments the best practice is
• to register schemas outside the client application
• to control when schemas are registered with Schema Registry and how they evolve
• You can disable auto schema registration on the producer auto.register.schemas: false
• Schema Registry: Schema registry security plugin
package com.example.avro.customer;
/** Avro schema for our customer. */
@org.apache.avro.specific.AvroGenerated
public class Customer extends org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = 1600536469030327220L;
public static final org.apache.avro.Schema SCHEMA$ = new
org.apache.avro.Schema.Parser().parse("{"type":"record","name":"CustomerBackwardDemo
","namespace":"com.example.avro.customer","doc":"Avro schema for our
customer.","fields":[{"name":"name","type":{"type":"string","avro.java.string":"String"
},"doc":"The name of the
Customer."},{"name":"occupation","type":{"type":"string","avro.java.string":"String"},
"doc":"The occupation of the Customer."}],"version":1}");
…
}
Auto register schema lessons learned
• Maven Avro plugin: additional information appended to the schema in Java code
• Producer (KafkaAvroSerializer): auto.register.schemas: false
• When serializing the Avro Schema is derived from Customer Java object
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
Mismatch in schema comparison
Avro Schema (avsc) registered
in Schema Registry
Avro Schema (Java) in producer
Auto register schema lessons learned
• If you are using the its recommented to set this property (KafkaAvroSerializer)
avro.remove.java.properties: true
Note:
There is an open issue for the Avro Maven Plugin for this AVRO-2838
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
No mismatch in schema comparison
Avro Schema (avsc) Avro Schema (as Java String)
Schema Evolution Guidelines
Rules of the Road for Modifying Schemas
If you want to make your schema evolvable, then follow these guidelines.
§ Provide a default value for fields in your schema, as this allows you to delete the field later.
§ Don’t change a field’s data type.
§ Don’t rename an existing field (use aliases instead).
Breaking changes. How to move forward?
What can you do?
• “Force push“ schema
• BACKWARD -> NONE -> BACKWARD
• Allow for downtime?
• Both producers and consumer under your control?
• Last resort
• “Produce to multiple topics”
• V1 topic
• V2 topic
• Migrate consumers
• Transaction atomic operation
• Data Contracts for Schema Registry
• Field level transformations
Wrap up
Communication
§ Important to communicate changes between producing and consuming
teams
Gain more confidence
§ Add unit/integration tests to make sure your changes are compatible
Wrap up
Schema registration
§ Don’t allow applications register schemas automatically
§ Don’t assume application will set auto.register.schemas=false
§ Make sure to have security measurements in place
Be aware of pitfalls
§ Avro Maven plugin adds: "avro.java.string"
§ Deserialization exceptions on the consumer side
Questions?
🤔
❔
Demo codebase:
https://github.com/j-tim/kafka-summit-london-2024
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibility and Schema Registry

Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibility and Schema Registry

  • 1.
    Evolve your schemasin a better way! A deep dive into Avro schema compatibility and Schema Registry Tim van Baarsen & Kosta Chuturkov
  • 2.
    About the Speakers TheNetherlands - Amsterdam Team Dora Romania - Bucharest
  • 3.
    ING www.ing.jobs • 60,000+ employees •Serve 37+ million customers • Corporate clients and financial institutions in over 40 countries
  • 4.
    Kafka @ ING Frontrunnersin Kafka since 2014 Running in production: • 9 years • 7000+ topics • Serving 1000+ Development teams • Self service topic management
  • 5.
    Kafka @ ING Trafficis growing with +10% monthly 0 200.000 400.000 600.000 800.000 1.000.000 1.200.000 1.400.000 1.600.000 1.800.000 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 Messages produced per second (average) Messages produced per second (average)
  • 6.
    What are wegoing to cover today ? • Why Schemas? • What compatibility level to pick? • What changes can I make when evolving my schemas? • What options do I have when I need to introduce a breaking change? • Should we automatically register schemas from our applications? • How do you generate Java classes from your Avro schemas and you build an automated test suite (unit tests)
  • 7.
    Why schemas? The onlyconstant in life is change! -Heraclitus (Greek philosopher)
  • 8.
    Why schemas? The onlyconstant in life is change! The same applies to your Kafka events flowing through your streaming applications.
  • 9.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Responsibilities: - subscribe - deserialization • key • value - heartbeat Responsibilities: - send - serialization • key • value Not responsible for: • Type checking • Schema validation * • Other constraints Data in a Kafka topic are just stored as bytes! Deserializer Serializer = flow of data 0100101001101 your-topic Kafka client Kafka client
  • 10.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumers and producers are decoupled at runtime Kafka client Kafka client
  • 11.
    Why schemas? Consumers andproducers are decoupled at runtime Producer Application Kafka cluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Kafka client Kafka client Kafka client
  • 12.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Indirectly coupled on the data format Kafka client Kafka client Kafka client
  • 13.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Indirectly coupled on the data format What fields and types of data can I expect? Documentation of the fields? Kafka client Kafka client Kafka client
  • 14.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Indirectly coupled on the data format Some requirements changed. We need to introduce a new field Don’t cause inconsistency Keep It compatible! No disruption my service Kafka client Kafka client Kafka client
  • 15.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema Kafka client Kafka client Kafka client
  • 16.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema Kafka client Kafka client Kafka client
  • 17.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema We need the schema the data was written with to be able to read it Kafka client Kafka client Kafka client
  • 18.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Schema Don’t send the schema each time we send data Kafka client Kafka client Kafka client
  • 19.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new Deserializer Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Deserializer Confluent Schema Registry Kafka client Kafka client Kafka client Don’t send the schema each time we send data
  • 20.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry Kafka client Kafka client Kafka client Serializer Deserializer Deserializer
  • 21.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry Register Schema Kafka client Kafka client Kafka client Deserializer Deserializer
  • 22.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry id: 1 id: 1 id: 1 Register Schema id: 1 Schema id Kafka client Kafka client Kafka client Deserializer Deserializer
  • 23.
    Why schemas? Producer Application Kafkacluster Consumer Application 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Deserializer KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Confluent Schema Registry KafkaAvro Deserializer id: 1 id: 1 id: 1 Register Schema id: 1 id: 1 id: 1 id: 1 Schema id Schema id Kafka client Kafka client Kafka client
  • 24.
    Why schemas? Producer Application Kafkacluster Consumer Application Kafka client 0100101001101 0100101001101 poll send old 0 1 2 3 4 5 6 new KafkaAvro Deserializer KafkaAvro Serializer your-topic Consumer Application 0 1 0 0 1 0 1 0 0 1 1 0 1 Kafka client Confluent Schema Registry KafkaAvro Deserializer id: 1 id: 1 id: 1 Register Schema id: 1 id: 1 id: 1 id: 1 Schema id Schema id Load Schema Kafka client Confluent Schema Registry = runtime dependency Need high availability
  • 25.
    Avro § At INGwe prefer Avro § Apache Avro™ is a data serialization system offering rich data structures and uses a compact and binary format. { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string” }, { "name": ”isJointAccountHolder", "type": ”boolean "} ] } { "name": ”Jack", "isJointAccountHolder": true }
  • 26.
    Avro field types qprimitive types (null, boolean, int, long, float, double, bytes, and string) q complex types (record, enum, array, map, union, and fixed). q Logical types(decimal, uuid, date…) { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string” }, { "name": ”isJointAccountHolder", "type": ”boolean "}, { "name": "country", "type": { "name": "Country", "type": "enum", "symbols" : ["US", "UK", "NL"] } }, { "name": ”dateJoined", "type": ”long”, ”logicalType": ” timestamp-millis” } ] } { "name": ”Jack", "isJointAccountHolder": true, ”country": ”UK”, ”dateJoined": 1708944593285 }
  • 27.
    Maps Note: the valuestype applies for the values in the map. The keys are strings. Example java Map representation: Map<String, Long> customerPropertiesMap = new HashMap<>(); { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": "name", "type": "string" }, { "name": "isJointAccountHolder", "type": "boolean "}, { "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}}, { "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"}, { "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }} ] } { "name": ”Jack", "isJointAccountHolder": true, ”country": ”UK”, ”dateJoined": 1708944593285, “customerPropertiesMap”: {”key1": 1708, ”key2": 1709} }
  • 28.
    Fixed { "type": "record", "namespace": "com.example", "name":"Customer", "fields": [ { "name": "name", "type": "string" }, { "name": "isJointAccountHolder", "type": "boolean "}, { "name": "country", "type": { "name": "Country", "type": "enum", "symbols": ["US", "UK", "NL"]}}, { "name": "dateJoined", "type": "long", "logicalType": " timestamp-millis"}, { "name": "customerPropertiesMap", "type": { "type": "map", "values": "long" }, "doc": "Customer properties"}, { "name": "annualIncome", "type": ["null", {"name": "AnnualIncome", "type": "fixed", "size": 32}],"doc": "Annual income of the Customer.", "default": null} ] } { "name": ”Jack", "isJointAccountHolder": true, ”country": ”UK”, ”dateJoined": 1708944593285, “customerPropertiesMap”: {”key1": 1708, ”key2": 1709}, ”annualIncome": [64, -9, 92, …] }
  • 29.
    Unions • Unions arerepresented using JSON arrays • For example, ["null", "string"] declares a schema which may be either a null or string. • Question: Who thinks this a valid definition? { …. "fields": [ { "name": "firstName", "type": ["null", "string"], "doc": "The first name of the Customer." } ,…] { …. "fields": [ { "name": "firstName", "type": ["null", "string”, “int”], "doc": "The first name of the Customer." } ,…] org.apache.kafka.common.errors.SerializationException: Error serializing Avro message… … Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","string","int"]: true
  • 30.
    Aliases { …. "fields": [ { "name": ”customerName", "aliases":[ ”name" ], "type": "string”, "doc": "The name of the Customer.", "default": null } ,…] { …. "fields": [ { "name": ”name", "type": "string”, "doc": "The name of the Customer.", "default": null } ,…] • Named types and fields may have aliases • Aliases function by re-writing the writer's schema using aliases from the reader's schema. Consumer Producer
  • 31.
  • 32.
    BACKWARD { "type": "record", "namespace": "com.example", "name":"Customer", "version": "1", "fields": [ { "name": ”name", "type": "string” }, { "name": "occupation", "type": "string "} ] } Producer 1: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "1", "fields": [ { "name": ”name", "type": "string” }, { "name": "occupation", "type": "string "} ] } Consumer 1 read: V1 Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer 2 read: V2 (Delete field)
  • 33.
    BACKWARD { "type": "record", "namespace": "com.example", "name":"Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Producer 1: V2 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer 1 read: V2 Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer 2 read: V2 (Delete field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} Consumer 3 read: V3 (Add optional field)
  • 34.
    BACKWARD TRANSITIVE { "type": "record", "namespace":"com.example", "name": "Customer", "version": "1", "fields": [ { "name": ”name", "type": "string” }, { "name": "occupation", "type": "string "} ] } Producer: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": " name ", "type": "string "}, { "name": "occupation", "type": "string "} } Consumer read: V2 (Delete field) Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": “null”} ] } Consumer read: V3 (Add optional field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”…n", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} ] } Consumer read: V…n (Delete field) Compatible
  • 35.
    FORWARD Consumer Producer { "type": "record", "namespace": "com.example", "name":"Customer", "version": ”1", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null"," double"],"default": null} ] } { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”1", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null",”double"],"default": null} ] } { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”2", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null",”double"],"default": null} ] } Producer write: V2 (Delete Optional Field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": " annualIncome", "type": ["null",”double"],"default": null}, {"name": "dateOfBirth", "type": "string", "doc": "The date of birth for the Customer."} ] } Producer write: V3 (Add Required Field) Consumer read: V1 Producer write: V1
  • 36.
    FORWARD TRANSITIVE { "type": "record", "namespace":"com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} ] } Producer: V2 (Delete Optional Field) { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null}, {"name": "dateOfBirth", "type": "string”} ] } Producer: V3 (Add Field) Consumer Producer { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”3", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null} ] } Consumer read: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": ”…n", "fields": [ { "name": "name ", "type": "string ”}, { "name": "occupation", "type": "string"}, { "name": " annualIncome", "type": ["null","int"],"default": null}, {"name": "dateOfBirth", "type": "string"}, {"name": "phoneNumber", "type": "string”} ] } Producer: V..n (Add Field) Compatible
  • 37.
    FULL { "type": "record", "namespace": "com.example", "name":"Customer", "version": ”1", "fields": [ { "name": "name , "type": [”null",”string"],"default": null} { "name": "occupation , "type": [”null",”string"],"default": null} { "name": "annualIncome", "type": ["null","int"],"default": null}, { "name": "dateOfBirth", "type": [”null",”string"],"default": null} } Producer: V1 { "type": "record", "namespace": "com.example", "name": "Customer", "version": "2", "fields": [ { "name": "name , "type": [”null",”string"],"default": null} { "name": "occupation , "type": [”null",”string"],"default": null} { "name": " annualIncome", "type": ["null","int"],"default": null}, { "name": "dateOfBirth", "type": [”null",”string"],"default": null}, { "name": "phoneNumber" , "type": [”null",”string"],"default": null} } Consumer read: V2 Consumer Producer NOTE: • The default values apply only on the consumer side. • On the producer side you need to set a value for the field
  • 38.
    Available compatibility types From:Confluent Schema Registry documentation New schema can be used to read old data Old schema can be used to read new data Both backward and forward No compatibility enforced
  • 39.
    What compatibility touse ? • If you are the topic owner and the producer and in control of evolving the schema and you don’t want you break existing consumers, use FORWARD • If you are the topic owner and a consumer, use BACKWARD, so you can upgrade first and then ask the producer to evolve its schema with the fields you need
  • 40.
  • 41.
    Backward Compatibility Demo:Components kafka-producer-one (schema v1 ) kafka-consumer-third (schema v4) poll send kafka-consumer- first (schema v2) poll kafka-consumer- second (schema v3) occupation (required) annualIncome (optional) age (required) Kafka cluster old 0 1 2 3 4 5 6 new customers-topic- backward poll Adding a required field is not a backward compatible change!
  • 42.
    Forward Compatibility Demo:Components kafka-producer-one (schema v2) send kafka-consumer- first (schema v1) poll kafka-producer-two (schema v3) send kafka-producer- three (schema v4) send annualIncome (optional) dateOfBirth (required) phoneNumber (required) Kafka cluster old 0 1 2 3 4 5 6 new customers-topic- forward Removing a required field is not a forward compatible change!
  • 43.
    Plugins: Avro Schemato Java Class Avro Schema .avsc avro-maven-plugin .class .jar maven-compiler-plugin .java maven-jar-plugin
  • 44.
    Plugins: Avro Schemato Java Class Avro Schema .avsc avro-maven-plugin .class .jar maven-compiler-plugin .java maven-jar-plugin • Validation of Avro Syntax • No validation on compatibility!
  • 45.
    package com.example.avro.customer; /** Avroschema for our customer. */ @org.apache.avro.specific.AvroGenerated public class Customer extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { private static final long serialVersionUID = 1600536469030327220L; public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{"type":"record","name":" CustomerBackwardDemo","namespace":"com.example.avro.custom er","doc":"Avro schema for our customer.","fields":[{"name":"name","type":{"type":"string"," avro.java.string":"String"},"doc":"The name of the Customer."},{"name":"occupation","type":{"type":"string","avro .java.string":"String"},"doc":"The occupation of the Customer."}],"version":1}"); … } { "namespace": "com.example.avro.customer", "type": "record", "name": "Customer", "version": 1, "doc": "Avro schema for our customer.", "fields": [ { "name": "name", "type": "string", "doc": "The name of the Customer." }, { "name": "occupation", "type": "string", "doc": "The occupation of the Customer." } ] } Plugins: Avro Schema to Java Class Customer.avsc Customer.java
  • 46.
    Plugins: Avro Schemato Java Class .jar Producer Kafka client Consumer Kafka client Kafka Streams App Specific record: Customer.class
  • 47.
    Test Avro compatibility Integrationtest style Confluent Schema Registry Subject: customer-value Compatibility: BACKWARD REST API V1 V2 V1 V2 V3 Validate compatibility • Curl • Confluent CLI • Confluent Maven Plugin Your Java project Registered in the Schema Registry
  • 48.
    Test Avro compatibility Integrationtest style Confluent Schema Registry Subject: customer-value Compatibility: BACKWARD REST API V1 V2 V1 V2 V3 Validate compatibility • Curl • Confluent CLI • Confluent Schema Registry Maven Plugin Your Java project Registered in the Schema Registry Automate in Your Maven build
  • 49.
    Test Avro compatibility Integrationtest style Confluent Schema Registry Subject: customer-value Compatibility: BACKWARD REST API V1 V2 V1 V2 V3 Validate compatibility • curl • Confluent CLI • Confluent Schema Registry Maven Plugin Your Java project Registered in the Schema Registry Automate in Your Maven build
  • 50.
    Test Avro compatibility:Unit tests Unit test style V1 V2 V3 Validate compatibility • curl • Confluent CLI • Confluent Schema Registry Maven Plugin • Unit tests Your Java project Automate in Your Maven build Validate compatibility
  • 51.
  • 52.
    Should we autoregister schemas ? • By default, client applications automatically register new schemas • Auto registration is performed by the producers only • For development environments you can use auto register schema • For Prod environments the best practice is • to register schemas outside the client application • to control when schemas are registered with Schema Registry and how they evolve • You can disable auto schema registration on the producer auto.register.schemas: false • Schema Registry: Schema registry security plugin
  • 53.
    package com.example.avro.customer; /** Avroschema for our customer. */ @org.apache.avro.specific.AvroGenerated public class Customer extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { private static final long serialVersionUID = 1600536469030327220L; public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{"type":"record","name":"CustomerBackwardDemo ","namespace":"com.example.avro.customer","doc":"Avro schema for our customer.","fields":[{"name":"name","type":{"type":"string","avro.java.string":"String" },"doc":"The name of the Customer."},{"name":"occupation","type":{"type":"string","avro.java.string":"String"}, "doc":"The occupation of the Customer."}],"version":1}"); … } Auto register schema lessons learned • Maven Avro plugin: additional information appended to the schema in Java code • Producer (KafkaAvroSerializer): auto.register.schemas: false • When serializing the Avro Schema is derived from Customer Java object { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string”, ”doc": "The name of the Customer.” } ] } Mismatch in schema comparison Avro Schema (avsc) registered in Schema Registry Avro Schema (Java) in producer
  • 54.
    Auto register schemalessons learned • If you are using the its recommented to set this property (KafkaAvroSerializer) avro.remove.java.properties: true Note: There is an open issue for the Avro Maven Plugin for this AVRO-2838 { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string”, ”doc": "The name of the Customer.” } ] } { "type": "record", "namespace": "com.example", "name": "Customer", "fields": [ { "name": ”name", "type": "string”, ”doc": "The name of the Customer.” } ] } No mismatch in schema comparison Avro Schema (avsc) Avro Schema (as Java String)
  • 55.
    Schema Evolution Guidelines Rulesof the Road for Modifying Schemas If you want to make your schema evolvable, then follow these guidelines. § Provide a default value for fields in your schema, as this allows you to delete the field later. § Don’t change a field’s data type. § Don’t rename an existing field (use aliases instead).
  • 56.
    Breaking changes. Howto move forward? What can you do? • “Force push“ schema • BACKWARD -> NONE -> BACKWARD • Allow for downtime? • Both producers and consumer under your control? • Last resort • “Produce to multiple topics” • V1 topic • V2 topic • Migrate consumers • Transaction atomic operation • Data Contracts for Schema Registry • Field level transformations
  • 57.
    Wrap up Communication § Importantto communicate changes between producing and consuming teams Gain more confidence § Add unit/integration tests to make sure your changes are compatible
  • 58.
    Wrap up Schema registration §Don’t allow applications register schemas automatically § Don’t assume application will set auto.register.schemas=false § Make sure to have security measurements in place Be aware of pitfalls § Avro Maven plugin adds: "avro.java.string" § Deserialization exceptions on the consumer side
  • 59.