"The only constant in life is change! The same applies to your Kafka events flowing through your streaming applications.
The Confluent Schema Registry allows us to control how schemas can evolve over time without breaking the compatibility of our streaming applications. But when you start with Kafka and (Avro) schemas, this can be pretty overwhelming.
Join Kosta and Tim as we dive into the tricky world of backward and forward compatibility in schema design. During this deep dive talk, we are going to answer questions like:
* What compatibility level to pick?
* What changes can I make when evolving my schemas?
* What options do I have when I need to introduce a breaking change?
* Should we automatically register schemas from our applications? Or do we need a separate step in our deployment process to promote schemas to higher-level environments?
* What to promote first? My producer, consumer or schema?
* How do you generate Java classes from your Avro schemas using Maven or Gradle, and how to integrate this into your project(s)?
* How do you build an automated test suite (unit tests) to gain more confidence and verify you are not breaking compatibility? Even before deploying a new version of your schema or application.
With live demos, we'll show you how to make schema changes work seamlessly. Emphasizing the crucial decisions, using real-life examples, pitfalls and best practices when promoting schemas on the consumer and producer sides.
Explore the ins and outs of Apache Avro and the Schema Registry with us at the Kafka Summit! Start evolving your schemas in a better way today, and join this talk!"
4. Kafka @ ING
Frontrunners in Kafka since 2014
Running in production:
• 9 years
• 7000+ topics
• Serving 1000+ Development teams
• Self service topic management
5. Kafka @ ING
Traffic is growing with +10% monthly
0
200.000
400.000
600.000
800.000
1.000.000
1.200.000
1.400.000
1.600.000
1.800.000
2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Messages produced per second (average)
Messages produced per second (average)
6. What are we going to cover today ?
• Why Schemas?
• What compatibility level to pick?
• What changes can I make when evolving my schemas?
• What options do I have when I need to introduce a breaking change?
• Should we automatically register schemas from our applications?
• How do you generate Java classes from your Avro schemas and you build an
automated test suite (unit tests)
12. Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
Kafka
client
Kafka
client
Kafka
client
13. Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
What fields and
types of data can I
expect?
Documentation of
the fields?
Kafka
client
Kafka
client
Kafka
client
14. Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Indirectly coupled on the data format
Some requirements
changed. We need to
introduce a new field
Don’t cause
inconsistency
Keep It compatible!
No disruption my
service
Kafka
client
Kafka
client
Kafka
client
17. Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
We need the schema
the data was written
with to be able to read it
Kafka
client
Kafka
client
Kafka
client
18. Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Schema
Don’t send the schema
each time we send data
Kafka
client
Kafka
client
Kafka
client
19. Why schemas?
Producer Application
Kafka cluster
Consumer Application
0100101001101
0100101001101
poll
send
old
0 1 2 3 4 5 6
new
Deserializer
Serializer your-topic
Consumer Application
0
1
0
0
1
0
1
0
0
1
1
0
1
Deserializer
Confluent Schema
Registry
Kafka
client
Kafka
client
Kafka
client
Don’t send the schema
each time we send data
29. Unions
• Unions are represented using JSON arrays
• For example, ["null", "string"] declares a schema which may be either a
null or string.
• Question: Who thinks this a valid definition?
{
….
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"doc": "The first name of the Customer."
}
,…]
{
….
"fields": [
{
"name": "firstName",
"type": ["null", "string”, “int”],
"doc": "The first name of the Customer."
}
,…]
org.apache.kafka.common.errors.SerializationException: Error
serializing Avro message…
…
Caused by: org.apache.avro.UnresolvedUnionException: Not in union
["null","string","int"]: true
30. Aliases
{
….
"fields": [
{
"name": ”customerName",
"aliases": [
”name"
],
"type": "string”,
"doc": "The name of the Customer.",
"default": null
}
,…]
{
….
"fields": [
{
"name": ”name",
"type": "string”,
"doc": "The name of the Customer.",
"default": null
}
,…]
• Named types and fields may have aliases
• Aliases function by re-writing the writer's schema using aliases from the reader's schema.
Consumer
Producer
37. FULL
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": ”1",
"fields": [
{ "name": "name , "type": [”null",”string"],"default": null}
{ "name": "occupation , "type": [”null",”string"],"default": null}
{ "name": "annualIncome", "type": ["null","int"],"default": null},
{ "name": "dateOfBirth", "type": [”null",”string"],"default": null}
}
Producer: V1
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"version": "2",
"fields": [
{ "name": "name , "type": [”null",”string"],"default": null}
{ "name": "occupation , "type": [”null",”string"],"default": null}
{ "name": " annualIncome", "type": ["null","int"],"default": null},
{ "name": "dateOfBirth", "type": [”null",”string"],"default": null},
{ "name": "phoneNumber" , "type": [”null",”string"],"default": null}
}
Consumer read: V2
Consumer
Producer
NOTE:
• The default values apply only on the consumer side.
• On the producer side you need to set a value for the field
38. Available compatibility types
From: Confluent Schema Registry documentation
New schema can
be used to read
old data
Old schema can
be used to read
new data
Both backward
and forward
No compatibility
enforced
39. What compatibility to use ?
• If you are the topic owner and the producer and in control of evolving the
schema and you don’t want you break existing consumers, use
FORWARD
• If you are the topic owner and a consumer, use BACKWARD, so you can
upgrade first and then ask the producer to evolve its schema with the
fields you need
41. Backward Compatibility Demo: Components
kafka-producer-one
(schema v1 )
kafka-consumer-third
(schema v4)
poll
send
kafka-consumer-
first (schema v2)
poll
kafka-consumer-
second (schema v3)
occupation (required)
annualIncome (optional)
age (required)
Kafka cluster
old
0 1 2 3 4 5 6
new
customers-topic-
backward
poll
Adding a required field is not a
backward compatible change!
42. Forward Compatibility Demo: Components
kafka-producer-one
(schema v2) send
kafka-consumer-
first (schema v1)
poll
kafka-producer-two
(schema v3) send
kafka-producer-
three (schema v4) send
annualIncome (optional)
dateOfBirth (required)
phoneNumber (required)
Kafka cluster
old
0 1 2 3 4 5 6
new
customers-topic-
forward
Removing a required field is not a
forward compatible change!
43. Plugins: Avro Schema to Java Class
Avro
Schema
.avsc
avro-maven-plugin
.class .jar
maven-compiler-plugin
.java
maven-jar-plugin
44. Plugins: Avro Schema to Java Class
Avro
Schema
.avsc
avro-maven-plugin
.class .jar
maven-compiler-plugin
.java
maven-jar-plugin
• Validation of Avro Syntax
• No validation on compatibility!
45. package com.example.avro.customer;
/** Avro schema for our customer. */
@org.apache.avro.specific.AvroGenerated
public class Customer extends
org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = 1600536469030327220L;
public static final org.apache.avro.Schema SCHEMA$ = new
org.apache.avro.Schema.Parser().parse("{"type":"record","name":"
CustomerBackwardDemo","namespace":"com.example.avro.custom
er","doc":"Avro schema for our
customer.","fields":[{"name":"name","type":{"type":"string","
avro.java.string":"String"},"doc":"The name of the
Customer."},{"name":"occupation","type":{"type":"string","avro
.java.string":"String"},"doc":"The occupation of the
Customer."}],"version":1}");
…
}
{
"namespace": "com.example.avro.customer",
"type": "record",
"name": "Customer",
"version": 1,
"doc": "Avro schema for our customer.",
"fields": [
{
"name": "name",
"type": "string",
"doc": "The name of the Customer."
},
{
"name": "occupation",
"type": "string",
"doc": "The occupation of the Customer."
}
]
}
Plugins: Avro Schema to Java Class
Customer.avsc Customer.java
46. Plugins: Avro Schema to Java Class
.jar
Producer
Kafka
client Consumer
Kafka
client
Kafka Streams
App
Specific record:
Customer.class
47. Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• Curl
• Confluent CLI
• Confluent Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
48. Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• Curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Automate in
Your Maven
build
49. Test Avro compatibility
Integration test style
Confluent Schema
Registry
Subject: customer-value
Compatibility: BACKWARD
REST API
V1 V2
V1 V2 V3
Validate compatibility
• curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
Your
Java
project
Registered in
the Schema
Registry
Automate in
Your Maven
build
50. Test Avro compatibility: Unit tests
Unit test style
V1 V2 V3
Validate compatibility
• curl
• Confluent CLI
• Confluent Schema
Registry Maven Plugin
• Unit tests
Your
Java
project
Automate in
Your Maven
build
Validate compatibility
52. Should we auto register schemas ?
• By default, client applications automatically register new schemas
• Auto registration is performed by the producers only
• For development environments you can use auto register schema
• For Prod environments the best practice is
• to register schemas outside the client application
• to control when schemas are registered with Schema Registry and how they evolve
• You can disable auto schema registration on the producer auto.register.schemas: false
• Schema Registry: Schema registry security plugin
53. package com.example.avro.customer;
/** Avro schema for our customer. */
@org.apache.avro.specific.AvroGenerated
public class Customer extends org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = 1600536469030327220L;
public static final org.apache.avro.Schema SCHEMA$ = new
org.apache.avro.Schema.Parser().parse("{"type":"record","name":"CustomerBackwardDemo
","namespace":"com.example.avro.customer","doc":"Avro schema for our
customer.","fields":[{"name":"name","type":{"type":"string","avro.java.string":"String"
},"doc":"The name of the
Customer."},{"name":"occupation","type":{"type":"string","avro.java.string":"String"},
"doc":"The occupation of the Customer."}],"version":1}");
…
}
Auto register schema lessons learned
• Maven Avro plugin: additional information appended to the schema in Java code
• Producer (KafkaAvroSerializer): auto.register.schemas: false
• When serializing the Avro Schema is derived from Customer Java object
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
Mismatch in schema comparison
Avro Schema (avsc) registered
in Schema Registry
Avro Schema (Java) in producer
54. Auto register schema lessons learned
• If you are using the its recommented to set this property (KafkaAvroSerializer)
avro.remove.java.properties: true
Note:
There is an open issue for the Avro Maven Plugin for this AVRO-2838
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{
"name": ”name",
"type": "string”,
”doc": "The name of the Customer.”
}
]
}
No mismatch in schema comparison
Avro Schema (avsc) Avro Schema (as Java String)
55. Schema Evolution Guidelines
Rules of the Road for Modifying Schemas
If you want to make your schema evolvable, then follow these guidelines.
§ Provide a default value for fields in your schema, as this allows you to delete the field later.
§ Don’t change a field’s data type.
§ Don’t rename an existing field (use aliases instead).
56. Breaking changes. How to move forward?
What can you do?
• “Force push“ schema
• BACKWARD -> NONE -> BACKWARD
• Allow for downtime?
• Both producers and consumer under your control?
• Last resort
• “Produce to multiple topics”
• V1 topic
• V2 topic
• Migrate consumers
• Transaction atomic operation
• Data Contracts for Schema Registry
• Field level transformations
57. Wrap up
Communication
§ Important to communicate changes between producing and consuming
teams
Gain more confidence
§ Add unit/integration tests to make sure your changes are compatible
58. Wrap up
Schema registration
§ Don’t allow applications register schemas automatically
§ Don’t assume application will set auto.register.schemas=false
§ Make sure to have security measurements in place
Be aware of pitfalls
§ Avro Maven plugin adds: "avro.java.string"
§ Deserialization exceptions on the consumer side