1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Satish Duggana, Hortonworks
Dataworks summit - 2017, Munich
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introduction
 What is Schema Registry?
• A shared repository of schemas that allows applications to flexibly interact with each other
 What Value does Schema Registry Provide?
– Data Governance
• Provide reusable schema
• Define relationship between schemas
• Enable generic format conversion, and generic routing
– Operational Efficiency
• To avoid attaching schema to every piece of data
• Producers and consumers can evolve at different rates
 Example Use
– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Concepts
• Schema Group
A logical grouping/container for
similar type of schemas or
based any criteria that the
customer has from managing
the schemas
• Schema Metadata
Metadata associated with a
named schema.
• Schema Version
The actual versioned schema
associated a schema meta
definition
Schema Metadata 1
Schema Name
Schema Type
Description
Compatibility Policy
Serializers
Deserializers
Schema Group
Group Name
SchemaVersion 3
SchemaVersion 2
Schema Version 1
version
text
Fingerprint
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Schema Registry Component Architecture
SR Web Server
Schema Registry
Web App
REST APISchema Registry Client
Java Client
Integrations
Nifi Processors Kafka Ser/Des StreamLine
Schema
Storage
Pluggable Storage
Serializer/Deserializer
Jar Storage
MySQL In-Memory Local File
System
HDFSPostgres
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writer/Reader schemas
 Writer schema
– Senders/Producers use this schema while sending the payloads according to the given schema viz
writer’s schema
 Reader/Projection schema
– Receivers uses this schema to project the received payload written with a writer schema.
Sender Receiver
Writer
Schema
Writer
Schema
Projection
Schema
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema evolution
Producer
v2
Consumer
v2
Producer
v1
Producer
v4
Consumer
v5
Producer
v1
Consumer
v7
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Compatibility Policies
 What is a Compatibility Policy?
– Defines the rules of how the schemas can evolve
– Subsequent version updates has to honor the schema’s original compatibility.
 Policies Supported
– Backward
– Forward
– Both
– None
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Backward compatibility
 New version of a schema would be compatible with earlier version of that schema.
 Data written from earlier version of the schema, can be read with a new version of the
schema.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int",
"default": -1
}
]
}
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Forward compatibility
 Existing schema is compatible with future versions of the schema.
 That means the data written from new version of the schema can still be read with old
version of the schema.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int"
}
]
}
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Both/Full compatibility
 New version of the schema provides both backward and forward compatibilities.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int",
"default": -1
},
{
"name": "title",
"type" : "string",
"default": ""
}
]
}
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema composition
 Schemas can be shared and reused with in existing schemas
 Inbuilt support in default serializer/deserializer to build effective schemas
{
"name": "account",
"namespace": "com.hortonworks.example.types",
"includeSchemas": [
{
"name": "utils”
}
],
"type": "record",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "com.hortonworks.datatypes.uuid"
}
]
}
{
"name": "uuid",
"type": "record",
"namespace": "com.hortonworks.datatypes",
"doc": "A Universally Unique Identifier, in canonical form in
lowercase. This is generated from java.util.UUID Example:
de305d54-75b4-431b-adb2-eb6b9e546014",
"fields": [
{
"name": "value",
"type": "string",
"default": ""
}
]
}
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sender/Receiver flow
Local
schema/serdes
cache
Serializer
Sender
Schema Registry
Client
Message Store
Local
schema/serdes
cache
Deserializer
Schema Registry
Client
version
payload
version
payload
Schema Storage SerDes Storage
Receiver
SchemaRegistrySchemaRegistry SchemaRegistry
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Serializers/Deserializers
 Snapshot based serializer/deserializer
– Serializes the complete payload
– Deserializes the payload to respective type
 Pull based serializer/deserializer
– Serialize whatever elements are required and ignore other elements
– Pull out whatever elements that are required to build the desired object
 Push based deserializer
– Gives callback to receive parsing events for respective fields in schema
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema registry client
 REST based client
 Caching
– Metadata
– Schema versions
– Ser/des libs and class loaders
 URL selectors
– Round robin
– Failover
– Custom
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HA
 Storage provider
– Depends on transactional support of
underlying SQL stores
– Spinup required schema registry
instances
 Supports HA at SchemaRegistry
– Using ZK/Curator
– Automatic failover of master
– Master gets all writes
– Slaves receive only reads
SchemaRegistry
storage
SchemaRegistrySchemaRegistry
SchemaRegistry
SchemaRegistry
SchemaRegistry
storage
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integration of Schema Registry
 Kafka
– Using producer/consumer API for serializer/deserializer
 Nifi Processors for Schema Registry
– Fetch Schema
– Serialize/Deserialize with Schema
 StreamLine
– Lookup Schema of a Kafka Topic
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka integration
Local
schema/serdes
cache
KafkaAvro
Serializer
Producer
Schema Registry
Client
Local
schema/serdes
cache
KafkaAvro
Deserializer
Schema Registry
Client
version
payload
version
payload
Consumer
SchemaRegistrySchemaRegistry SchemaRegistry
Kafka
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Avro ser/des protocol
 ser/des can be implemented with different protocols
 Default ser/des send protocol/schema versions as part of the binary payload of kafka
messages
– Can be enhanced to use headers/metadata instead of the message payload
– Custom ser/des can be registered for schemas.
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nifi integration
 Nifi Controller Service
 Nifi processors
– Transforms
• Avro – CSV
• Avro – Json
• Json – CSV
– Extracting Avro fields
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry UI
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
WIP/Future enhancements
 Security
– Kerberos support
– Default authorizers and Apache Ranger support
 Archiving schemas
 Notifications
– New versions
– Archiving
 Converters
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 https://github.com/hortonworks/registry
 https://groups.google.com/forum/#!forum/registry
 Open sourced under Apache license
 Apache incubation soon
 Contributions are welcome
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A

Schema Registry - Set you Data Free

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry Satish Duggana, Hortonworks Dataworks summit - 2017, Munich
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Introduction  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other  What Value does Schema Registry Provide? – Data Governance • Provide reusable schema • Define relationship between schemas • Enable generic format conversion, and generic routing – Operational Efficiency • To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates  Example Use – Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry Concepts • Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas • Schema Metadata Metadata associated with a named schema. • Schema Version The actual versioned schema associated a schema meta definition Schema Metadata 1 Schema Name Schema Type Description Compatibility Policy Serializers Deserializers Schema Group Group Name SchemaVersion 3 SchemaVersion 2 Schema Version 1 version text Fingerprint
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST APISchema Registry Client Java Client Integrations Nifi Processors Kafka Ser/Des StreamLine Schema Storage Pluggable Storage Serializer/Deserializer Jar Storage MySQL In-Memory Local File System HDFSPostgres
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Writer/Reader schemas  Writer schema – Senders/Producers use this schema while sending the payloads according to the given schema viz writer’s schema  Reader/Projection schema – Receivers uses this schema to project the received payload written with a writer schema. Sender Receiver Writer Schema Writer Schema Projection Schema
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema evolution Producer v2 Consumer v2 Producer v1 Producer v4 Consumer v5 Producer v1 Consumer v7
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Compatibility Policies  What is a Compatibility Policy? – Defines the rules of how the schemas can evolve – Subsequent version updates has to honor the schema’s original compatibility.  Policies Supported – Backward – Forward – Both – None
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Backward compatibility  New version of a schema would be compatible with earlier version of that schema.  Data written from earlier version of the schema, can be read with a new version of the schema. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 } ] }
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Forward compatibility  Existing schema is compatible with future versions of the schema.  That means the data written from new version of the schema can still be read with old version of the schema. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int" } ] }
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Both/Full compatibility  New version of the schema provides both backward and forward compatibilities. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 }, { "name": "title", "type" : "string", "default": "" } ] }
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema composition  Schemas can be shared and reused with in existing schemas  Inbuilt support in default serializer/deserializer to build effective schemas { "name": "account", "namespace": "com.hortonworks.example.types", "includeSchemas": [ { "name": "utils” } ], "type": "record", "fields": [ { "name": "name", "type": "string" }, { "name": "id", "type": "com.hortonworks.datatypes.uuid" } ] } { "name": "uuid", "type": "record", "namespace": "com.hortonworks.datatypes", "doc": "A Universally Unique Identifier, in canonical form in lowercase. This is generated from java.util.UUID Example: de305d54-75b4-431b-adb2-eb6b9e546014", "fields": [ { "name": "value", "type": "string", "default": "" } ] }
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Sender/Receiver flow Local schema/serdes cache Serializer Sender Schema Registry Client Message Store Local schema/serdes cache Deserializer Schema Registry Client version payload version payload Schema Storage SerDes Storage Receiver SchemaRegistrySchemaRegistry SchemaRegistry
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Serializers/Deserializers  Snapshot based serializer/deserializer – Serializes the complete payload – Deserializes the payload to respective type  Pull based serializer/deserializer – Serialize whatever elements are required and ignore other elements – Pull out whatever elements that are required to build the desired object  Push based deserializer – Gives callback to receive parsing events for respective fields in schema
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema registry client  REST based client  Caching – Metadata – Schema versions – Ser/des libs and class loaders  URL selectors – Round robin – Failover – Custom
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved HA  Storage provider – Depends on transactional support of underlying SQL stores – Spinup required schema registry instances  Supports HA at SchemaRegistry – Using ZK/Curator – Automatic failover of master – Master gets all writes – Slaves receive only reads SchemaRegistry storage SchemaRegistrySchemaRegistry SchemaRegistry SchemaRegistry SchemaRegistry storage
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Integration of Schema Registry  Kafka – Using producer/consumer API for serializer/deserializer  Nifi Processors for Schema Registry – Fetch Schema – Serialize/Deserialize with Schema  StreamLine – Lookup Schema of a Kafka Topic
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Kafka integration Local schema/serdes cache KafkaAvro Serializer Producer Schema Registry Client Local schema/serdes cache KafkaAvro Deserializer Schema Registry Client version payload version payload Consumer SchemaRegistrySchemaRegistry SchemaRegistry Kafka
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Kafka Avro ser/des protocol  ser/des can be implemented with different protocols  Default ser/des send protocol/schema versions as part of the binary payload of kafka messages – Can be enhanced to use headers/metadata instead of the message payload – Custom ser/des can be registered for schemas.
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Nifi integration  Nifi Controller Service  Nifi processors – Transforms • Avro – CSV • Avro – Json • Json – CSV – Extracting Avro fields
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry UI
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved WIP/Future enhancements  Security – Kerberos support – Default authorizers and Apache Ranger support  Archiving schemas  Notifications – New versions – Archiving  Converters
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved Try it out!  https://github.com/hortonworks/registry  https://groups.google.com/forum/#!forum/registry  Open sourced under Apache license  Apache incubation soon  Contributions are welcome
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved Q & A

Editor's Notes

  • #20 Exposes operations to serialize and deserialize the contents of the FlowFile as well as the operations to query the actual schema. NOTE: at the moment only AVRO schema type is supported. UpdateAttributeViaSchemaRegistry Transform processors