SlideShare a Scribd company logo
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cloudurable provides Cassandra and Kafka Support on
AWS/EC2
Kafka Tutorial (Part 2)
Java Producer and Consumer
examples
Schema Registry
Tutorial on Kafka
Streaming Platform
Part 2
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
StockPrice Producer Java Example
Lab StockPrice
Producer Java
Example
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPrice App to demo Advanced
Producer
❖ StockPrice - holds a stock price has a name, dollar, and cents
❖ StockPriceKafkaProducer - Configures and creates
KafkaProducer<String, StockPrice>, StockSender list,
ThreadPool (ExecutorService), starts StockSender runnable into
thread pool
❖ StockAppConstants - holds topic and broker list
❖ StockPriceSerializer - can serialize a StockPrice into byte[]
❖ StockSender - generates somewhat random stock prices for a
given StockPrice name, Runnable, 1 thread per StockSender
❖ Shows using KafkaProducer from many threads
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPrice domain object
❖ has name
❖ dollars
❖ cents
❖ converts
itself to
JSON
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPriceKafkaProducer
❖ Import classes and setup logger
❖ Create createProducer method to create KafkaProducer
instance
❖ Create setupBootstrapAndSerializers to initialize bootstrap
servers, client id, key serializer and custom serializer
(StockPriceSerializer)
❖ Write main() method - creates producer, create StockSender
list passing each instance a producer, creates a thread pool so
every stock sender gets it own thread, runs each stockSender in
its own thread
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPriceKafkaProducer imports,
createProducer
❖ Import classes and
setup logger
❖ createProducer
used to create a
KafkaProducer
❖ createProducer()
calls
setupBoostrapAnd
Serializers()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Configure Producer Bootstrap and
Serializer
❖ Create setupBootstrapAndSerializers to initialize bootstrap
servers, client id, key serializer and custom serializer
(StockPriceSerializer)
❖ StockPriceSerializer will serialize StockPrice into bytes
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPriceKafkaProducer.mai
n()
❖ main method - creates producer,
❖ create StockSender list passing each instance a producer
❖ creates a thread pool (executorService)
❖ every StockSender runs in its own thread
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockAppConstants
❖ topic name for Producer example
❖ List of bootstrap servers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPriceKafkaProducer.getStockSen
derList
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockPriceSerializer
❖ Converts StockPrice into byte array
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockSender
❖ Generates random stock prices for a given StockPrice
name,
❖ StockSender is Runnable
❖ 1 thread per StockSender
❖ Shows using KafkaProducer from many threads
❖ Delays random time between delayMin and delayMax,
❖ then sends random StockPrice between stockPriceHigh
and stockPriceLow
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockSender imports,
Runnable
❖ Imports Kafka Producer, ProducerRecord, RecordMetadata, StockPrice
❖ Implements Runnable, can be submitted to ExecutionService
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockSender constructor
❖ takes a topic, high & low stockPrice, producer, delay min & max
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockSender run()
❖ In loop, creates random record, send record, waits random
time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockSender
createRandomRecord
❖ createRandomRecord uses randomIntBetween
❖ creates StockPrice and then wraps StockPrice in ProducerRecord
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
StockSender
displayRecordMetaData
❖ Every 100 records displayRecordMetaData gets called
❖ Prints out record info, and recordMetadata info:
❖ key, JSON value, topic, partition, offset, time
❖ uses Future from call to producer.send()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run it
❖ Run ZooKeeper
❖ Run three Brokers
❖ run create-topic.sh
❖ Run StockPriceKafkaProducer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run scripts
❖ run ZooKeeper from ~/kafka-training
❖ use bin/create-topic.sh to create
topic
❖ use bin/delete-topic.sh to delete
topic
❖ use bin/start-1st-server.sh to run
Kafka Broker 0
❖ use bin/start-2nd-server.sh to run
Kafka Broker 1
❖ use bin/start-3rd-server.sh to run
Kafka Broker 2
Config is under directory called config
server-0.properties is for Kafka Broker 0
server-1.properties is for Kafka Broker 1
server-2.properties is for Kafka Broker 2
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run All 3 Brokers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run create-topic.sh script
Name of the topic
is stock-prices
Three partitions
Replication factor
of three
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run StockPriceKafkaProducer
❖ Run StockPriceKafkaProducer from the IDE
❖ You should see log messages from StockSender(s)
with StockPrice name, JSON value, partition, offset,
and time
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Lab Adding an
orderly shutdown
flush and close
Using flush and close
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Shutdown Producer nicely
❖ Handle ctrl-C shutdown from Java
❖ Shutdown thread pool and wait
❖ Flush producer to send any outstanding batches if using
batches (producer.flush())
❖ Close Producer (producer.close()) and wait
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Nice Shutdown
producer.close()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Restart Producer then shut it
down
❖ Add shutdown hook
❖ Start StockPriceKafkaProducer
❖ Now stop it (CTRL-C or hit stop button in IDE)
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Lab Configuring
Producer Durability
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Lab Batching
Records
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Lab Adding Retries
and Timeouts
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Advanced Consumers
Currently Not added to
sample
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Working with Avro Serialization and the Schema Registry
Schema Management in
Kafka
Understanding Avro and the
Schema Registry for Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Overview
❖ First we cover Avro
❖ Avro Schema
❖ Using a Schema to write data
❖ Using a Schema to read data
❖ Then we cover the Schema Registry
❖ Managing Schemas
❖ Kafka Avro Serialization
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Avro
Introduction to Avro
for Big Data
and Streaming
Architecture
Apache Avro Data
Serialization for Hadoop and
Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Overview
❖ Why Avro?
❖ What is Avro?
❖ Why are schemas important
❖ Examples reading and writing with Avro
❖ Working with Code generation
❖ Working with Avro Generic Record
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Avro?
❖ Direct mapping to JSON
❖ Compact binary format
❖ Very Fast
❖ Polyglot bindings
❖ Code generation for static languages
❖ Code generation not needed
❖ Supports Schema
❖ Supports compatibility checks
❖ Allows evolving your data over time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Apache Avro
❖ Data serialization system
❖ Data structures
❖ Binary data format
❖ Container file format to store persistent data
❖ RPC capabilities
❖ Does not require code generation to use
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Schemas
❖ Supports schemas for defining data structure
❖ Serializing and deserializing data, uses schema
❖ File schema
❖ Avro files store data with its schema
❖ RPC Schema
❖ RPC protocol exchanges schemas as part of the
handshake
❖ Schemas written in JSON
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro compared to…
❖ Similar to Thrift, Protocol Buffers, JSON, etc.
❖ Does not require code generation
❖ Avro needs less encoding as part of the data since it
stores names and types in the schema
❖ It supports evolution of schemas.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why schemas?
❖ Multiple Producers and Consumers evolve over time
❖ Schemas keep your data clean
❖ NoSQL was no schema but not as much: Cassandra
has schemas
❖ REST/JSON was schema-less and IDL-less but not
anymore Swagger and RAML
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema provides Robustness
❖ Streaming architecture support
❖ Challenging: Consumers and Producers evolve on different timelines.
❖ Producers send streams of records that 0 to * Consumers read
❖ All consumers or use cases are not known ahead of time
❖ Data could go to Hadoop or some store used years later in use cases you
can’t imagine
❖ Schemas help future proof data; makes data more robust.
❖ Avro Schema with support for evolution is important
❖ Schema provides robustness: provides meta-data about data stored with
data
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Future Compatibility
❖ Data record format compatibility is a hard problem to solve
❖ Schema capture a point in time of what your data when it was stored
❖ Data will evolve. New fields are added.
❖ Kafka streams are often recorded in data lakes - needs to be less rigid
than schema for operational data
❖ Producers can have Consumers that they never know about
❖ You can’t test a Consumer that you don’t know about
❖ Avro Schemas support agility sakes
❖ 80% of data science is data cleanup, Schemas help
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Schema
Avro schema stored in src/main/avro by default.
very field and record has a “doc” attribute which you should use to document the purpose of the fie
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Code Generation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Employee Code Generation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Using Generated Avro class
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Writing employees to an
Avro File
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Reading employees From a
File
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Using GenericRecord
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Writing Generic Records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Reading using Generic
Records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Schema Validation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro supported types
❖ Records
❖ Arrays
❖ Enums
❖ Unions
❖ Maps
❖ String, Int, Boolean, Decimal, Timestamp, Date
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Types 1
avro type json type example description
null null null no value
boolean boolean truetrue or false
int integer 132 bit signed int
long 64 bit signed int
float number 1.1single precision (32-bit) floating-point
number
double number 1.2double precision (64-bit) floating-point
number
bytes string "u00FF"
string string "foo"
record object {"a": 1} complex type / object
enum string “Hi Mom" Enumerated type
array array [1] array of any value
union can be used to define nullable fields
map object {"a": 1}
fixed string "u00ff" fixed length string
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Types 2
avro type json type example description
date int 2000 stores the number of days from the unix
epoch, 1 January 1970
time-millis int 10800000 stores the number of milliseconds after
midnight
time-micros long 1000000 microseconds after midnight
timestamp-
milis
long 1494437796880 milliseconds since epoch
decimal complex/object 22.555 specifies scale and precision like
BigDecimal in Java
duration complex/object Specifies duration in months, days and
milliseconds
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Fuller example Avro Schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Record Attributes
❖ name: name of the record (required).
❖ namespace: equates to packages or modules
❖ doc: documentation for future user of this schema
❖ aliases: array aliases (alias names)
❖ fields: an array of fields
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Field Attributes
❖ name: name of the field (required)
❖ doc: description of field (important for future usage)
❖ type: JSON object defining a schema, or a JSON string
naming a record definition (required)
❖ default: Default value for this field
❖ order: specifies sort ordering of record (optional).
❖ "ascending" (the default), "descending", "ignore"
❖ aliases: array of alternate names
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Tips for Using Avro
❖ Avoid advance features which are not supported by polyglot language
mappings
❖ Think simple data objects
❖ Instead of magic strings use enums (better validation)
❖ Document all records and fields in the schema
❖ Avoid complex union types (use Unions for nullable fields only)
❖ Avoid recursive types
❖ Use reasonable field names
❖ employee_id instead of id and then use employee_id in all other records that
have a field that refer to the employee_id
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro
❖ Fast data serialization
❖ Supports data structures
❖ Supports Records, Maps, Array, and basic types
❖ You can use it direct or use Code Generation
❖ Read more.
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Avro
Confluent Schema
Registry
and
Kafka Avro Serialization
Managing Record Schema in
Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Overview
❖ What is the Schema Registry?
❖ Why do you want to use it?
❖ Understanding Avro Schema Evolution
❖ Setting up and using Schema Registry
❖ Managing Schemas with REST interface
❖ Writing Avro Serializer based Producers
❖ Writing Avro Deserializer based Consumers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Confluent Schema Registry
❖ Confluent Schema Registry manage Avro Schemas for Kafka clients
❖ Provides REST interface for putting and getting Avro schemas
❖ Stores a history of schemas
❖ versioned
❖ allows you to configure compatibility setting
❖ supports evolution of schemas
❖ Provides serializers used by Kafka clients which handles schema storage and
serialization of records using Avro
❖ Don’t have to send schema just the schema uuid, schema can be looked up in
Registry. Helps with Speed!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Producer Schema
Process
❖ Producer creates a record/message, which is an Avro record
❖ Record contains schema and data
❖ Schema Registry Avro Serializer serializes the data and schema id
(just id)
❖ Keeps a cache of registered schemas from Schema Registry to
ids
❖ Consumer receives payload and deserializes it with Kafka Avro
Deserializers which use Schema Registry
❖ Deserializer looks up the full schema from cache or Schema
Registry based on id
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Schema Registry?
❖ Consumer has its schema, one it is expecting
record/message to conform to
❖ Compatibility check is performed or two schemas
❖ if no match, but are compatible, then payload
transformation happens aka Schema Evolution
❖ if not failure
❖ Kafka records have Key and Value and schema can be
done on both
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Actions
❖ Register schemas for key and values of Kafka records
❖ List schemas (subjects); List all versions of a subject (schema)
❖ Retrieve a schema by version or id; get latest version of
schema
❖ Check to see if schema is compatible with a certain version
❖ Get the compatibility level setting of the Schema Registry
❖ BACKWARDS, FORWARDS, FULL, NONE
❖ Add compatibility settings to a subject/schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Compatibility
❖ Backward compatibility
❖ data written with older schema can be read with newer schema
❖ Forward compatibility
❖ data written with newer schema can be read with an older schema
❖ Full compatibility
❖ New version of a schema is backward and forward compatible
❖ None
❖ Schema will not be validated for compatibility at all
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Config
❖ Compatibility can be configured globally or per schema
❖ Options are:
❖ NONE - don’t check for schema compatibility
❖ FORWARD - check to make sure last schema version is forward
compatible with new schemas
❖ BACKWARDS (default) - make sure new schema is backwards
compatible with latest
❖ FULL - make sure new schema is forwards and backwards
compatible from latest to new and from new to latest
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Evolution
❖ Avro schema is changed after data has been written to
store using an older version of that schema, then Avro
might do a Schema Evolution
❖ Schema evolution happens only during deserialization at
Consumer
❖ If Consumer’s schema is different from Producer’s
schema, then value or key is automatically modified
during deserialization to conform to consumers reader
schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Evolution 2
❖ Schema evolution is automatic transformation of Avro
schema
❖ transformation is between version of consumer schema
and what the producer put into the Kafka log
❖ When Consumer schema is not identical to the Producer
schema used to serialize the Kafka Record, then a data
transformation is performed on the Kafka record (key or
value)
❖ If the schemas match then no need to do a
transformation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Allowed Schema Modifications
❖ Add a field with a default
❖ Remove a field that had a default value
❖ Change a field’s order attribute
❖ Change a field’s default value
❖ Remove or add a field alias
❖ Remove or add a type alias
❖ Change a type to a union that contains original type
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Rules of the Road for modifying
Schema
❖ Provide a default value for fields in your schema
❖ Allows you to delete the field later
❖ Don’t change a field's data type
❖ When adding a new field to your schema, you have to
provide a default value for the field
❖ Don’t rename an existing field
❖ You can add an alias
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Remember our example
Employee
Avro covered in Avro/Kafka Tutorial
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Let’s say
❖ Employee did not have an age in version 1 of the
schema
❖ Later we decided to add an age field with a default value
of -1
❖ Now let’s say we have a Producer using version 2, and
a Consumer using version 1
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Scenario adding new field age w/
default value
❖ Producer uses version 2 of the Employee schema and
creates a com.cloudurable.Employee record, and sets
age field to 42, then sends it to Kafka topic new-
employees
❖ Consumer consumes records from new-employees
using version 1 of the Employee Schema
❖ Since Consumer is using version 1 of schema, age
field is removed during deserialization
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Scenario adding new field age w/ default
value 2
❖ Same consumer modifies some fields, then writes record to
a NoSQL store
❖ When it does this, the age field is missing from record
that it writes to NoSQL store
❖ Another client using version 2 reads the record from the
NoSQL store
❖ Age field is missing from the record (because the
Consumer wrote it with version 1), age is set to default
value of -1
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Actions
REST
❖ Register schemas for key and values of Kafka records
❖ List schemas (subjects)
❖ List all versions of a subject (schema)
❖ Retrieve a schema by version or id
❖ get latest version of schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Actions 2
❖ Check to see if schema is compatible with a certain
version
❖ Get the compatibility level setting of the Schema
Registry
❖ BACKWARDS, FORWARD, FULL, NONE
❖ Add compatibility settings to a subject/schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Register a Schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Register a Schema
{"id":2}
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json"
--data '{"schema": "{"type": …}’ 
http://localhost:8081/subjects/Employee/versions
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
List All Schema
["Employee","Employee2","FooBar"]
curl -X GET http://localhost:8081/subjects
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Working with versions
[1,2,3,4,5]
{“subject”:"Employee","version":2,"id":4,"schema":"
{"type":"record","name":"Employee",
”namespace”:"com.cloudurable.phonebook", …
{“subject”:"Employee","version":1,"id":3,"schema":"
{"type":"record","name":"Employee",
”namespace”:"com.cloudurable.phonebook", …
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Working with Schemas
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Changing Compatibility
Checks
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Incompatible Change
{“error_code":409,"
message":"Schema being registered is incompatible with an e
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Incompatible Change
{"is_compatible":false}
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use Schema Registry
❖ Start up Schema Registry server pointing to Zookeeper
cluster
❖ Import Kafka Avro Serializer and Avro Jars
❖ Configure Producer to use Schema Registry
❖ Use KafkaAvroSerializer from Producer
❖ Configure Consumer to use Schema Registry
❖ Use KafkaAvroDeserializer from Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start up Schema Registry
Server
cat ~/tools/confluent-3.2.1/etc/schema-registry/schema-registry.properties
listeners=http://0.0.0.0:8081
kafkastore.connection.url=localhost:2181
kafkastore.topic=_schemas
debug=false
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Import Kafka Avro Serializer &
Avro Jars
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Configure Producer to use Schema
Registry
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use KafkaAvroSerializer from
Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Configure Consumer to use Schema
Registry
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use KafkaAvroDeserializer from
Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry
❖ Confluent provides Schema Registry to manage Avro
Schemas for Kafka Consumers and Producers
❖ Avro provides Schema Migration
❖ Confluent uses Schema compatibility checks to see if
Producer schema and Consumer schemas are
compatible and to do Schema evolution if needed
❖ Use KafkaAvroSerializer from Producer
❖ Use KafkaAvroDeserializer from Consumer

More Related Content

What's hot

What's hot (20)

Brief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformBrief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming Platform
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
 
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWSAmazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWS
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Kafka as a message queue
Kafka as a message queueKafka as a message queue
Kafka as a message queue
 
Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureKafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data Architecture
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Schema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservicesSchema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservices
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Event Hub & Kafka
Event Hub & KafkaEvent Hub & Kafka
Event Hub & Kafka
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Efficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema RegistryEfficient Schemas in Motion with Kafka and Schema Registry
Efficient Schemas in Motion with Kafka and Schema Registry
 
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring bootJavaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
 
Microservices - modern software architecture
Microservices - modern software architectureMicroservices - modern software architecture
Microservices - modern software architecture
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018
 

Similar to Kafka Tutorial - Introduction to Apache Kafka (Part 2)

Similar to Kafka Tutorial - Introduction to Apache Kafka (Part 2) (20)

Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
spark-kafka_mod
spark-kafka_modspark-kafka_mod
spark-kafka_mod
 
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataAkka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
What's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkWhat's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech Talk
 
DevNation Live 2020 - What's new with Apache Camel 3
DevNation Live 2020 - What's new with Apache Camel 3DevNation Live 2020 - What's new with Apache Camel 3
DevNation Live 2020 - What's new with Apache Camel 3
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Training
TrainingTraining
Training
 
Connecting Apache Kafka With Mule ESB
Connecting Apache Kafka With Mule ESBConnecting Apache Kafka With Mule ESB
Connecting Apache Kafka With Mule ESB
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Kafka Tutorial - Introduction to Apache Kafka (Part 2)

  • 1. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Cloudurable provides Cassandra and Kafka Support on AWS/EC2 Kafka Tutorial (Part 2) Java Producer and Consumer examples Schema Registry Tutorial on Kafka Streaming Platform Part 2
  • 2. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting
  • 3. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting StockPrice Producer Java Example Lab StockPrice Producer Java Example
  • 4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPrice App to demo Advanced Producer ❖ StockPrice - holds a stock price has a name, dollar, and cents ❖ StockPriceKafkaProducer - Configures and creates KafkaProducer<String, StockPrice>, StockSender list, ThreadPool (ExecutorService), starts StockSender runnable into thread pool ❖ StockAppConstants - holds topic and broker list ❖ StockPriceSerializer - can serialize a StockPrice into byte[] ❖ StockSender - generates somewhat random stock prices for a given StockPrice name, Runnable, 1 thread per StockSender ❖ Shows using KafkaProducer from many threads
  • 5. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPrice domain object ❖ has name ❖ dollars ❖ cents ❖ converts itself to JSON
  • 6. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPriceKafkaProducer ❖ Import classes and setup logger ❖ Create createProducer method to create KafkaProducer instance ❖ Create setupBootstrapAndSerializers to initialize bootstrap servers, client id, key serializer and custom serializer (StockPriceSerializer) ❖ Write main() method - creates producer, create StockSender list passing each instance a producer, creates a thread pool so every stock sender gets it own thread, runs each stockSender in its own thread
  • 7. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPriceKafkaProducer imports, createProducer ❖ Import classes and setup logger ❖ createProducer used to create a KafkaProducer ❖ createProducer() calls setupBoostrapAnd Serializers()
  • 8. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Configure Producer Bootstrap and Serializer ❖ Create setupBootstrapAndSerializers to initialize bootstrap servers, client id, key serializer and custom serializer (StockPriceSerializer) ❖ StockPriceSerializer will serialize StockPrice into bytes
  • 9. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPriceKafkaProducer.mai n() ❖ main method - creates producer, ❖ create StockSender list passing each instance a producer ❖ creates a thread pool (executorService) ❖ every StockSender runs in its own thread
  • 10. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockAppConstants ❖ topic name for Producer example ❖ List of bootstrap servers
  • 11. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPriceKafkaProducer.getStockSen derList
  • 12. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockPriceSerializer ❖ Converts StockPrice into byte array
  • 13. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockSender ❖ Generates random stock prices for a given StockPrice name, ❖ StockSender is Runnable ❖ 1 thread per StockSender ❖ Shows using KafkaProducer from many threads ❖ Delays random time between delayMin and delayMax, ❖ then sends random StockPrice between stockPriceHigh and stockPriceLow
  • 14. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockSender imports, Runnable ❖ Imports Kafka Producer, ProducerRecord, RecordMetadata, StockPrice ❖ Implements Runnable, can be submitted to ExecutionService
  • 15. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockSender constructor ❖ takes a topic, high & low stockPrice, producer, delay min & max
  • 16. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockSender run() ❖ In loop, creates random record, send record, waits random time
  • 17. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockSender createRandomRecord ❖ createRandomRecord uses randomIntBetween ❖ creates StockPrice and then wraps StockPrice in ProducerRecord
  • 18. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ StockSender displayRecordMetaData ❖ Every 100 records displayRecordMetaData gets called ❖ Prints out record info, and recordMetadata info: ❖ key, JSON value, topic, partition, offset, time ❖ uses Future from call to producer.send()
  • 19. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run it ❖ Run ZooKeeper ❖ Run three Brokers ❖ run create-topic.sh ❖ Run StockPriceKafkaProducer
  • 20. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run scripts ❖ run ZooKeeper from ~/kafka-training ❖ use bin/create-topic.sh to create topic ❖ use bin/delete-topic.sh to delete topic ❖ use bin/start-1st-server.sh to run Kafka Broker 0 ❖ use bin/start-2nd-server.sh to run Kafka Broker 1 ❖ use bin/start-3rd-server.sh to run Kafka Broker 2 Config is under directory called config server-0.properties is for Kafka Broker 0 server-1.properties is for Kafka Broker 1 server-2.properties is for Kafka Broker 2
  • 21. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run All 3 Brokers
  • 22. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run create-topic.sh script Name of the topic is stock-prices Three partitions Replication factor of three
  • 23. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run StockPriceKafkaProducer ❖ Run StockPriceKafkaProducer from the IDE ❖ You should see log messages from StockSender(s) with StockPrice name, JSON value, partition, offset, and time
  • 24. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Lab Adding an orderly shutdown flush and close Using flush and close
  • 25. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Shutdown Producer nicely ❖ Handle ctrl-C shutdown from Java ❖ Shutdown thread pool and wait ❖ Flush producer to send any outstanding batches if using batches (producer.flush()) ❖ Close Producer (producer.close()) and wait
  • 26. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Nice Shutdown producer.close()
  • 27. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Restart Producer then shut it down ❖ Add shutdown hook ❖ Start StockPriceKafkaProducer ❖ Now stop it (CTRL-C or hit stop button in IDE)
  • 28. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Lab Configuring Producer Durability
  • 29. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Lab Batching Records
  • 30. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Lab Adding Retries and Timeouts
  • 31. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Advanced Consumers Currently Not added to sample
  • 32. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Working with Avro Serialization and the Schema Registry Schema Management in Kafka Understanding Avro and the Schema Registry for Kafka
  • 33. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Overview ❖ First we cover Avro ❖ Avro Schema ❖ Using a Schema to write data ❖ Using a Schema to read data ❖ Then we cover the Schema Registry ❖ Managing Schemas ❖ Kafka Avro Serialization
  • 34. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Avro Introduction to Avro for Big Data and Streaming Architecture Apache Avro Data Serialization for Hadoop and Kafka
  • 35. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Overview ❖ Why Avro? ❖ What is Avro? ❖ Why are schemas important ❖ Examples reading and writing with Avro ❖ Working with Code generation ❖ Working with Avro Generic Record
  • 36. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why Avro? ❖ Direct mapping to JSON ❖ Compact binary format ❖ Very Fast ❖ Polyglot bindings ❖ Code generation for static languages ❖ Code generation not needed ❖ Supports Schema ❖ Supports compatibility checks ❖ Allows evolving your data over time
  • 37. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Apache Avro ❖ Data serialization system ❖ Data structures ❖ Binary data format ❖ Container file format to store persistent data ❖ RPC capabilities ❖ Does not require code generation to use
  • 38. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Schemas ❖ Supports schemas for defining data structure ❖ Serializing and deserializing data, uses schema ❖ File schema ❖ Avro files store data with its schema ❖ RPC Schema ❖ RPC protocol exchanges schemas as part of the handshake ❖ Schemas written in JSON
  • 39. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro compared to… ❖ Similar to Thrift, Protocol Buffers, JSON, etc. ❖ Does not require code generation ❖ Avro needs less encoding as part of the data since it stores names and types in the schema ❖ It supports evolution of schemas.
  • 40. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why schemas? ❖ Multiple Producers and Consumers evolve over time ❖ Schemas keep your data clean ❖ NoSQL was no schema but not as much: Cassandra has schemas ❖ REST/JSON was schema-less and IDL-less but not anymore Swagger and RAML
  • 41. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema provides Robustness ❖ Streaming architecture support ❖ Challenging: Consumers and Producers evolve on different timelines. ❖ Producers send streams of records that 0 to * Consumers read ❖ All consumers or use cases are not known ahead of time ❖ Data could go to Hadoop or some store used years later in use cases you can’t imagine ❖ Schemas help future proof data; makes data more robust. ❖ Avro Schema with support for evolution is important ❖ Schema provides robustness: provides meta-data about data stored with data
  • 42. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Future Compatibility ❖ Data record format compatibility is a hard problem to solve ❖ Schema capture a point in time of what your data when it was stored ❖ Data will evolve. New fields are added. ❖ Kafka streams are often recorded in data lakes - needs to be less rigid than schema for operational data ❖ Producers can have Consumers that they never know about ❖ You can’t test a Consumer that you don’t know about ❖ Avro Schemas support agility sakes ❖ 80% of data science is data cleanup, Schemas help
  • 43. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Schema Avro schema stored in src/main/avro by default. very field and record has a “doc” attribute which you should use to document the purpose of the fie
  • 44. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Code Generation
  • 45. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Employee Code Generation
  • 46. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Using Generated Avro class
  • 47. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Writing employees to an Avro File
  • 48. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Reading employees From a File
  • 49. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Using GenericRecord
  • 50. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Writing Generic Records
  • 51. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Reading using Generic Records
  • 52. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Schema Validation
  • 53. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro supported types ❖ Records ❖ Arrays ❖ Enums ❖ Unions ❖ Maps ❖ String, Int, Boolean, Decimal, Timestamp, Date
  • 54. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Types 1 avro type json type example description null null null no value boolean boolean truetrue or false int integer 132 bit signed int long 64 bit signed int float number 1.1single precision (32-bit) floating-point number double number 1.2double precision (64-bit) floating-point number bytes string "u00FF" string string "foo" record object {"a": 1} complex type / object enum string “Hi Mom" Enumerated type array array [1] array of any value union can be used to define nullable fields map object {"a": 1} fixed string "u00ff" fixed length string
  • 55. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Types 2 avro type json type example description date int 2000 stores the number of days from the unix epoch, 1 January 1970 time-millis int 10800000 stores the number of milliseconds after midnight time-micros long 1000000 microseconds after midnight timestamp- milis long 1494437796880 milliseconds since epoch decimal complex/object 22.555 specifies scale and precision like BigDecimal in Java duration complex/object Specifies duration in months, days and milliseconds
  • 56. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Fuller example Avro Schema
  • 57. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Record Attributes ❖ name: name of the record (required). ❖ namespace: equates to packages or modules ❖ doc: documentation for future user of this schema ❖ aliases: array aliases (alias names) ❖ fields: an array of fields
  • 58. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Field Attributes ❖ name: name of the field (required) ❖ doc: description of field (important for future usage) ❖ type: JSON object defining a schema, or a JSON string naming a record definition (required) ❖ default: Default value for this field ❖ order: specifies sort ordering of record (optional). ❖ "ascending" (the default), "descending", "ignore" ❖ aliases: array of alternate names
  • 59. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Tips for Using Avro ❖ Avoid advance features which are not supported by polyglot language mappings ❖ Think simple data objects ❖ Instead of magic strings use enums (better validation) ❖ Document all records and fields in the schema ❖ Avoid complex union types (use Unions for nullable fields only) ❖ Avoid recursive types ❖ Use reasonable field names ❖ employee_id instead of id and then use employee_id in all other records that have a field that refer to the employee_id
  • 60. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro ❖ Fast data serialization ❖ Supports data structures ❖ Supports Records, Maps, Array, and basic types ❖ You can use it direct or use Code Generation ❖ Read more.
  • 61. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Avro Confluent Schema Registry and Kafka Avro Serialization Managing Record Schema in Kafka
  • 62. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Overview ❖ What is the Schema Registry? ❖ Why do you want to use it? ❖ Understanding Avro Schema Evolution ❖ Setting up and using Schema Registry ❖ Managing Schemas with REST interface ❖ Writing Avro Serializer based Producers ❖ Writing Avro Deserializer based Consumers
  • 63. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Confluent Schema Registry ❖ Confluent Schema Registry manage Avro Schemas for Kafka clients ❖ Provides REST interface for putting and getting Avro schemas ❖ Stores a history of schemas ❖ versioned ❖ allows you to configure compatibility setting ❖ supports evolution of schemas ❖ Provides serializers used by Kafka clients which handles schema storage and serialization of records using Avro ❖ Don’t have to send schema just the schema uuid, schema can be looked up in Registry. Helps with Speed!
  • 64. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Producer Schema Process ❖ Producer creates a record/message, which is an Avro record ❖ Record contains schema and data ❖ Schema Registry Avro Serializer serializes the data and schema id (just id) ❖ Keeps a cache of registered schemas from Schema Registry to ids ❖ Consumer receives payload and deserializes it with Kafka Avro Deserializers which use Schema Registry ❖ Deserializer looks up the full schema from cache or Schema Registry based on id
  • 65. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why Schema Registry? ❖ Consumer has its schema, one it is expecting record/message to conform to ❖ Compatibility check is performed or two schemas ❖ if no match, but are compatible, then payload transformation happens aka Schema Evolution ❖ if not failure ❖ Kafka records have Key and Value and schema can be done on both
  • 66. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Actions ❖ Register schemas for key and values of Kafka records ❖ List schemas (subjects); List all versions of a subject (schema) ❖ Retrieve a schema by version or id; get latest version of schema ❖ Check to see if schema is compatible with a certain version ❖ Get the compatibility level setting of the Schema Registry ❖ BACKWARDS, FORWARDS, FULL, NONE ❖ Add compatibility settings to a subject/schema
  • 67. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Compatibility ❖ Backward compatibility ❖ data written with older schema can be read with newer schema ❖ Forward compatibility ❖ data written with newer schema can be read with an older schema ❖ Full compatibility ❖ New version of a schema is backward and forward compatible ❖ None ❖ Schema will not be validated for compatibility at all
  • 68. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Config ❖ Compatibility can be configured globally or per schema ❖ Options are: ❖ NONE - don’t check for schema compatibility ❖ FORWARD - check to make sure last schema version is forward compatible with new schemas ❖ BACKWARDS (default) - make sure new schema is backwards compatible with latest ❖ FULL - make sure new schema is forwards and backwards compatible from latest to new and from new to latest
  • 69. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Evolution ❖ Avro schema is changed after data has been written to store using an older version of that schema, then Avro might do a Schema Evolution ❖ Schema evolution happens only during deserialization at Consumer ❖ If Consumer’s schema is different from Producer’s schema, then value or key is automatically modified during deserialization to conform to consumers reader schema
  • 70. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Evolution 2 ❖ Schema evolution is automatic transformation of Avro schema ❖ transformation is between version of consumer schema and what the producer put into the Kafka log ❖ When Consumer schema is not identical to the Producer schema used to serialize the Kafka Record, then a data transformation is performed on the Kafka record (key or value) ❖ If the schemas match then no need to do a transformation
  • 71. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Allowed Schema Modifications ❖ Add a field with a default ❖ Remove a field that had a default value ❖ Change a field’s order attribute ❖ Change a field’s default value ❖ Remove or add a field alias ❖ Remove or add a type alias ❖ Change a type to a union that contains original type
  • 72. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Rules of the Road for modifying Schema ❖ Provide a default value for fields in your schema ❖ Allows you to delete the field later ❖ Don’t change a field's data type ❖ When adding a new field to your schema, you have to provide a default value for the field ❖ Don’t rename an existing field ❖ You can add an alias
  • 73. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Remember our example Employee Avro covered in Avro/Kafka Tutorial
  • 74. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Let’s say ❖ Employee did not have an age in version 1 of the schema ❖ Later we decided to add an age field with a default value of -1 ❖ Now let’s say we have a Producer using version 2, and a Consumer using version 1
  • 75. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Scenario adding new field age w/ default value ❖ Producer uses version 2 of the Employee schema and creates a com.cloudurable.Employee record, and sets age field to 42, then sends it to Kafka topic new- employees ❖ Consumer consumes records from new-employees using version 1 of the Employee Schema ❖ Since Consumer is using version 1 of schema, age field is removed during deserialization
  • 76. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Scenario adding new field age w/ default value 2 ❖ Same consumer modifies some fields, then writes record to a NoSQL store ❖ When it does this, the age field is missing from record that it writes to NoSQL store ❖ Another client using version 2 reads the record from the NoSQL store ❖ Age field is missing from the record (because the Consumer wrote it with version 1), age is set to default value of -1
  • 77. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Actions REST ❖ Register schemas for key and values of Kafka records ❖ List schemas (subjects) ❖ List all versions of a subject (schema) ❖ Retrieve a schema by version or id ❖ get latest version of schema
  • 78. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Actions 2 ❖ Check to see if schema is compatible with a certain version ❖ Get the compatibility level setting of the Schema Registry ❖ BACKWARDS, FORWARD, FULL, NONE ❖ Add compatibility settings to a subject/schema
  • 79. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Register a Schema
  • 80. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Register a Schema {"id":2} curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{"type": …}’ http://localhost:8081/subjects/Employee/versions
  • 81. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ List All Schema ["Employee","Employee2","FooBar"] curl -X GET http://localhost:8081/subjects
  • 82. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Working with versions [1,2,3,4,5] {“subject”:"Employee","version":2,"id":4,"schema":" {"type":"record","name":"Employee", ”namespace”:"com.cloudurable.phonebook", … {“subject”:"Employee","version":1,"id":3,"schema":" {"type":"record","name":"Employee", ”namespace”:"com.cloudurable.phonebook", …
  • 83. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Working with Schemas
  • 84. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Changing Compatibility Checks
  • 85. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Incompatible Change {“error_code":409," message":"Schema being registered is incompatible with an e
  • 86. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Incompatible Change {"is_compatible":false}
  • 87. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use Schema Registry ❖ Start up Schema Registry server pointing to Zookeeper cluster ❖ Import Kafka Avro Serializer and Avro Jars ❖ Configure Producer to use Schema Registry ❖ Use KafkaAvroSerializer from Producer ❖ Configure Consumer to use Schema Registry ❖ Use KafkaAvroDeserializer from Consumer
  • 88. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Start up Schema Registry Server cat ~/tools/confluent-3.2.1/etc/schema-registry/schema-registry.properties listeners=http://0.0.0.0:8081 kafkastore.connection.url=localhost:2181 kafkastore.topic=_schemas debug=false
  • 89. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Import Kafka Avro Serializer & Avro Jars
  • 90. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Configure Producer to use Schema Registry
  • 91. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use KafkaAvroSerializer from Producer
  • 92. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Configure Consumer to use Schema Registry
  • 93. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use KafkaAvroDeserializer from Consumer
  • 94. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry ❖ Confluent provides Schema Registry to manage Avro Schemas for Kafka Consumers and Producers ❖ Avro provides Schema Migration ❖ Confluent uses Schema compatibility checks to see if Producer schema and Consumer schemas are compatible and to do Schema evolution if needed ❖ Use KafkaAvroSerializer from Producer ❖ Use KafkaAvroDeserializer from Consumer

Editor's Notes

  1. Kafka Tutorial - Introduction to Apache Kafka Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
  2. Avro supports direct mapping to JSON as well as a compact binary format. It is a very fast serialization format. Avro is widely used in the Hadoop ecosystem. Avro supports polyglot bindings to many programming languages and a code generation for static languages. For dynamically typed languages, code generation is not needed. Another key advantage of Avro is its support of evolutionary schemas which supports compatibility checks, and allows evolving your data over time.
  3. Apache Avro™ is a data serialization system. Avro provides data structures, binary data format, container file format to store persistent data and RPC capabilities. Avro does not require code generation to use. Integrates well with JavaScript, Python, Ruby and Java.
  4. Avro data format is defined by Avro schemas. When deserializing data, the schema is used. Data is serialized based on the schema, and schema is sent with data. Avro data plus schema is fully self-describing. When Avro files store data with its schema. Avro RPC is also based on schema. Part of the RPC protocol exchanges schemas as part of the handshake. When Avro is used in RPC, the client and server exchange schemas in the connection handshake. Avro schemas are written in JSON.
  5. Avro is similar to Thrift, Protocol Buffers, JSON, etc. Avro does not require code generation. Avro needs less encoding as part of the data since it stores names and types in the schema. It supports evolution of schemas.
  6. There was a trend towards schema-less as part of the NoSQL, but that pendulum has swung back a bit, e.g., Cassandra has schemas REST/JSON was schema-less and IDL-less but not anymore with Swagger, API gateways, and RAML. Now the trend is more towards schemas that can evolve and Avro fits well in this space.
  7. Streaming architecture like Kafka supports decoupling by sending data in streams to an unknown number of consumers. Streaming architecture is challenging as Consumers and Producers evolve on different timelines. Producers send a stream of records that zero to many Consumers read. Not only are there multiple consumers but data might end up in Hadoop or some other store and used for use cases you did not even imagine. Schemas help future proof your data and make it more robust. Supporting all use cases future (Big Data), past (older Consumers) and current use cases is not easy without a schema. Avro schema with its support for evolution is essential for making the data robust for streaming architectures like Kafka, and with the metadata that schema provides, you can reason on the data. Having a schema provides robustness in providing meta-data about the data stored in Avro records which are self-documenting the data.
  8. Data record format compatibility is a hard problem to solve with streaming architecture and Big Data. Avro schemas are not a cure-all, but essential for documenting and modeling your data. Avro Schema definitions capture a point in time of what your data looked like when it recorded since the schema is saved with the data. Data will evolve. New fields are added. Since streams often get recorded in data lakes like Hadoop and those records can represent historical data, not operational data, it makes sense that data streams and data lakes have a less rigid, more evolving schema than the schema of the operational relational database or Cassandra cluster. It makes sense to have a rigid schema for operational data, but not data that ends up in a data lake. With a streaming platform, consumers and producers can change all of the time and evolve quite a bit. Producers can have Consumers that they never know. You can’t test a Consumer that you don’t know. For agility sakes, you don’t want to update every Consumer every time a Producers adds a field to a Record. These types of updates are not feasible without support for Schema.
  9. Example Schema: {"namespace": "com.cloudurable.phonebook", "type": "record", "name": "Employee", "fields": [ {"name": "firstName", "type": “string"}, {"name": "lastName", "type": "string"}, {"name": "age", "type": "int"}, {"name": "phoneNumber", "type": "string"} ] } Avro schema is just JSON.
  10. There are plugins for Maven and Gradle to generate code based on Avro schemas. This gradle-avro-plugin is a Gradle plugin that uses Avro tools to do Java code generation for Apache Avro. This plugin supports Avro schema files (avsc), and Avro RPC IDL (avdl). For Kafka you only need avsc. Notice that we did not generate setter methods. This makes the instances somewhat immutable.
  11. The plugin generates the files and puts them under build/generated-main-avro-java.
  12. The Employee class has a constructor and has a builder.
  13. The above shows serializing an Employee list to disk. In Kafka, we will not be writing to disk directly. We are just showing how so you have a way to test Avro serialization, which is helpful when debugging schema incompatibilities. Note we create a DatumWriter, which converts Java instance into an in-memory serialized format. SpecificDatumWriter is used with generated classes like Employee. DataFileWriter writes the serialized records to the employee.avro file.
  14. The above deserializes employees from the employees.avro file. Deserializing is similar to serializing but in reverse. We create a SpecificDatumReader to converts in-memory serialized items into instances of our generated Employee class. The DatumReader reads records from the file by calling next. Another way to read is using forEach as follows: final DataFileReader<Employee> dataFileReader = new DataFileReader<>(file, empReader); dataFileReader.forEach(employeeList::add);
  15. You can use an Avro GenericRecord instead of using generated code.
  16. You can write to Avro files using GenericRecords as well.
  17. You can read from Avro files using generic records as well.
  18. Avro will validate the data types when it serializes and deserializes the data.
  19. The document https://avro.apache.org/docs/current/spec.html#Protocol+Declaration describes all of the supported types.
  20. Decimal “byte array must contain the two's-complement representation of the unscaled integer value in big-endian byte order”.
  21. Decimal “byte array must contain the two's-complement representation of the unscaled integer value in big-endian byte order”.
  22. The above has examples of default values, arrays, primitive types, Records within records, enums, and more. Notice that the nickName is an optional field and that we modeled that with a union.
  23. The doc attribute is imperative for future usage as it documents what the fields and records are supposed to represent. Remember that this data can outlive systems that produced it. A self-documenting schema is critical for a robust system.
  24. Avoid advanced Avro features which are not supported by polyglot language mappings. Think simple data transfer objects or structs. Don't use magic strings, use enums instead as they provide better validation. Document all records and fields in the schema. Documentation is imperative for future usage. Documents what the fields and records represent. A self-documenting schema is critical for a robust streaming system and Big Data. Don't use complex union types. Use Unions for nullable fields only and avoid using recursive types at all costs. Use reasonable field names and use them consistently with other records. Example, `employee_id` instead of `id` and then use `employee_id` in all other records that have a field that refer to the `employee_id` from Employee.
  25. Confluent Schema Registry stores Avro Schemas for Kafka producers and consumers. The Schema Registry and provides RESTful interface for managing Avro schemas It allows the storage of a history of schemas which are versioned. the Confluent Schema Registrysupports checking schema compatibility for Kafka. You can configure compatibility setting which supports the evolution of schemas using Avro. Kafka Avro serialization project provides serializers. Kafka Producers and Consumers that use Kafka Avro serialization handle schema management and serialization of records using Avro and the Schema Registry. When using the Confluent Schema Registry, Producers don’t have to send schema just the schema id which is unique. The consumer uses the schema id to look up the full schema from the Confluent Schema Registry if not already cached. Since you don’t have to send the schema with each set of records, this saves time. Not sending the schema with each record or batch of records, speeds up the serialization as only the id of the schema is sent.
  26. The Kafka Producer creates a record/message, which is an Avro record. The record contains a schema id and data. With Kafka Avro Serializer, the schema is registered if needed and then it serializes the data and schema id. The Kafka Avro Serializer keeps a cache of registered schemas from Schema Registry their schema ids. Consumers receive payloads and deserialize them with Kafka Avro Deserializers which use the Confluent Schema Registry. Deserializer looks up the full schema from cache or Schema Registry based on id.
  27. Consumer has its schema which could be different than the producers. The consumer schema is the schema the consumer is expecting the record/message to conform to. With the Schema Registry a compatibility check is performed and if the two schemas don’t match but are compatible, then the payload transformation happens via Avro Schema Evolution. Kafka records can have a Key and a Value and both can have a schema.
  28. The Schema Registry can store schemas for keys and values of Kafka records. It can also list schemas by subject. It can list all versions of a subject (schema). It can retrieve a schema by version or id. It can get the latest version of a schema. Importantly, the Schema Registry can check to see if schema is compatible with a certain version. There is a compatibility level (BACKWARDS, FORWARDS, FULL, NONE) setting for the Schema Registry and an individual subject. You can manage schemas via a REST API with the Schema registry.
  29. Backward compatibility means data written with an old schema is readable with a newer schema. Forward compatibility means data written with newer schema is readable with old schemas. Full compatibility means a new version of a schema is backward and forward compatible. None disables schema validation and it not recommended. If you set the level to none then Schema Registry just stores the schema and Schema will not be validated for compatibility at all.
  30. Schema Registry Config The Schema compatibility checks can is configured globally or per subject. The compatibility checks value is one of the following: • NONE - don’t check for schema compatibility • FORWARD - check to make sure last schema version is forward compatible with new schemas • BACKWARDS (default) - make sure new schema is backwards compatible with latest • FULL - make sure new schema is forwards and backwards compatible from latest to new and from new to latest
  31. Schema Evolution If an Avro schema is changed after data has been written to store using an older version of that schema, then Avro might do a Schema Evolution when you try to read that data. From Kafka perspective, Schema evolution happens only during deserialization at Consumer (read). If Consumer’s schema is different from Producer’s schema, then value or key is automatically modified during deserialization to conform to consumers reader schema if possible.
  32. Avro schema evolution is an automatic transformation of Avro schema between the consumer schema version and what the schema the producer put into the Kafka log. When Consumer schema is not identical to the Producer schema used to serialize the Kafka Record, then a data transformation is performed on the Kafka record’s key or value. If the schemas match then no need to do a transformation
  33. Allowed Modification During Schema Evolution You can add a field with a default to a schema. You can remove a field that had a default value. You can change a field’s order attribute. You can change a field’s default value to another value or add a default value to a field that did not have one. You can remove or add a field alias (keep in mind that this could break some consumers that depend on the alias). You can change a type to a union that contains original type. If you do any of the above, then your schema can use Avro’s schema evolution when reading with an old schema.
  34. Rules of the Road for modifying Schema If you want to make your schema evolvable, then follow these guidelines. Provide a default value for fields in your schema as this allows you to delete the field later. Never change a field’s data type. When adding a new field to your schema, you have to provide a default value for the field. Don’t rename an existing field (use aliases instead). You can add an alias
  35. Let’s use an example to talk about this. The following example is from our Avro tutorial.
  36. Let’s say our Employee record did not have an age in version 1 of the schema and then later we decided to add an age field with a default value of -1. Now let’s say we have a Producer using version 2 of the schema with age, and a Consumer using version 1 with no age.
  37. The Producer uses version 2 of the Employee schema and creates a com.cloudurable.Employee record, and sets age field to 42, then sends it to Kafka topic new-employees. The Consumer consumes records from new-employees using version 1 of the Employee Schema. Since Consumer is using version 1 of the schema, the age field gets removed during deserialization.
  38. The same consumer modifies some records and then writes the record to a NoSQL store. When the Consumer does this, the age field is missing from the record that it writes to the NoSQL store. Another client using version 2 of the schema which has the age, reads the record from the NoSQL store. The age field is missing from the record because the Consumer wrote it with version 1, thus the client reads the record and the age is set to default value of -1. If you added the age and it was not optional, i.e., the age field did not have a default, then the Schema Registry could reject the schema, and the Producer could never it add it to the Kafka log.
  39. Using REST Schema Registry REST API Recall that the Schema Registry allows you to manage schemas using the following operations: • store schemas for keys and values of Kafka records • List schemas by subject. • list all versions of a subject (schema). • Retrieves a schema by version • Retrieves a schema by id • Retrieve the latest version of a schema • Perform compatibility checks • Set compatibility level globally • Set compatibility level globally Recall that all of this is available via a REST API with the Schema Registry.
  40. If you have a good HTTP client, you can basically perform all of the above operations via the REST interface for the Schema Registry. I wrote a little example to do this so I could understand the Schema registry a little better using the OkHttp client from Square (com.squareup.okhttp3:okhttp:3.7.0+).
  41. I suggest running the example and trying to force incompatible schemas to the Schema Registry and note the behavior for the various compatibility settings.
  42. I suggest running the example and trying to force incompatible schemas to the Schema Registry and note the behavior for the various compatibility settings.
  43. Now let’s cover writing consumers and producers that use Kafka Avro Serializers which in turn use the Schema Registry and Avro. We will need to start up the Schema Registry server pointing to our Zookeeper cluster. Then we will need to import the Kafka Avro Serializer and Avro Jars into our gradle project. You will then need to configure the Producer to use Schema Registry and the KafkaAvroSerializer. To write the consumer, you will need to configure it to use Schema Registry and to use the KafkaAvroDeserializer.
  44. Notice that we include the Kafka Avro Serializer lib (io.confluent:kafka-avro-serializer:3.2.1) and the Avro lib (org.apache.avro:avro:1.8.1). To learn more about the Gradle Avro plugin, please read this article on using Avro. http://cloudurable.com/blog/avro/index.html
  45. Notice that we configure the schema registry and the KafkaAvroSerializer as part of the Producer setup.
  46. The producer code is like it was before in other examples.
  47. Notice just like the producer we have to tell the consumer where to find the Schema Registry, and we have to configure the Kafka Avro Deserializer.
  48. The consumer code is like it was before in other examples.
  49. Confluent provides Schema Registry to manage Avro Schemas for Kafka Consumers and Producers. Avro provides Schema Migration which is necessary for streaming and big data architectures. Confluent uses Schema compatibility checks to see if the Producer’s schema and Consumer’s schemas are compatible and to do Schema evolution if needed. You use KafkaAvroSerializer from the Producer and point to the Schema Registry. You use KafkaAvroDeserializer from Consumer and point to the Schema Registry.