1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Sriharsha Chintalapani, Hortonworks
Satish Duggana, Hortonworks
Dataworks summit, 2017, San Jose
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Introduction
 Concepts
 Architecture
 Integration
 Security
 Roadmap
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Schema Registry? What Value Does it Provide?
 What is Schema Registry?
• A shared repository of schemas that allows applications to flexibly interact with each other
 What Value does Schema Registry Provide?
– Data Governance
• Provide reusable schema
• Define relationship between schemas
• Enable generic format conversion, and generic routing
– Operational Efficiency
• To avoid attaching schema to every piece of data
• Producers and consumers can evolve at different rates
 Example Use
– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Concepts
• Schema Group
A logical grouping/container for
similar type of schemas or
based any criteria that the
customer has from managing
the schemas
• Schema Metadata
Metadata associated with a
named schema.
• Schema Version
The actual versioned schema
associated a schema meta
definition
Schema Metadata 1
Schema Name
Schema Type
Description
Compatibility Policy
Serializers
Deserializers
Schema Group
Group Name
SchemaVersion 3
SchemaVersion 2
Schema Version 1
version
text
Fingerprint
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Schema Registry Component Architecture
SR Web Server
Schema Registry
Web App
REST APISchema Registry Client
Java Client
Integrations
Nifi Processors Kafka Ser/Des StreamLine
Schema
Storage
Pluggable Storage
Serializer/Deserializer
Jar Storage
MySQL In-Memory Local File
System
HDFSPostgres
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sender/Receiver flow
Local
schema/serdes
cache
Serializer
Sender
Schema Registry
Client
Message Store
Local
schema/serdes
cache
Deserializer
Schema Registry
Client
version
payload
version
payload
Schema Storage SerDes Storage
Receiver
SchemaRegistrySchemaRegistry SchemaRegistry
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writer/Reader schemas
 Writer schema
– Senders/Producers use this schema while sending the payloads according to the given schema viz
writer’s schema
 Reader/Projection schema
– Receivers uses this schema to project the received payload written with a writer schema.
Sender Receiver
Writer
Schema
Writer
Schema
Projection
Schema
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema evolution
Producer
v2
Consumer
v2
Producer
v1
Producer
v4
Consumer
v5
Producer
v1
Consumer
v7
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Compatibility Policies
 What is a Compatibility Policy?
– Defines the rules of how the schemas can evolve
– Subsequent version updates has to honor the schema’s original compatibility.
 Policies Supported
– Backward
– Forward
– Both
– None
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Backward compatibility
 New version of a schema would be compatible with earlier version of that schema.
 Data written from earlier version of the schema, can be read with a new version of the
schema.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int",
"default": -1
}
]
}
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Forward compatibility
 Existing schema is compatible with future versions of the schema.
 That means the data written from new version of the schema can still be read with old
version of the schema.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int"
}
]
}
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Both/Full compatibility
 New version of the schema provides both backward and forward compatibilities.
V1
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
}
]
}
V2
{
"type": "record",
"name": "book",
"namespace": "registry.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "color",
"type": "string",
"default": "blue"
},
{
"name": "pages",
"type": "int",
"default": -1
},
{
"name": "title",
"type" : "string",
"default": ""
}
]
}
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema composition
 Schemas can be shared and reused
 Inbuilt support in default serializer/deserializer to build effective schemas
{
"name": "account",
"namespace": "com.hortonworks.example.types",
"includeSchemas": [
{
"name": "utils”
}
],
"type": "record",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "com.hortonworks.datatypes.uuid"
}
]
}
{
"name": "uuid",
"type": "record",
"namespace": "com.hortonworks.datatypes",
"doc": "A Universally Unique Identifier, in canonical form in
lowercase. This is generated from java.util.UUID Example:
de305d54-75b4-431b-adb2-eb6b9e546014",
"fields": [
{
"name": "value",
"type": "string",
"default": ""
}
]
}
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Serializers/Deserializers
 Snapshot based serializer/deserializer
– Seriliazes the complete payload
– Deserializes the payload to respective type
 Pull based serializer/deserializer
– Serialize whatever elements are required and ignore other elements
– Pull out whatever elements that are required to build the desired object
 Push based deserializer
– Gives callback to receive parsing events for respective fields in schema
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema registry client
 REST based client
 Caching
– Metadata
– Schema versions
– Ser/des libs and class loaders
 URL selectors
– Round robin
– Failover
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HA
 Storage provider
– Depends on transactional support of
underlying SQL stores
– Spinup required schema registry
instances
 Supports HA at SchemaRegistry
– Using ZK/Curator
– Automatic failover of master
– Master gets all writes
– Slaves receives only reads
SchemaRegistry
storage
SchemaRegistrySchemaRegistry
SchemaRegistry
SchemaRegistry
SchemaRegistry
storage
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integration of Schema Registry
 Kafka
– Using producer/consumer API for serializer/deserializer
 Nifi Processors for Schema Registry
– Fetch Schema
– Serialize/Deserialize with Schema
 StreamLine processors for Schema Registry
– Lookup Schema of a Kafka Topic
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka integration
Local
schema/serdes
cache
KafkaAvro
Serializer
Sender
Schema Registry
Client
Local
schema/serdes
cache
KafkaAvro
Deserializer
Schema Registry
Client
version
payload
version
payload
Receiver
SchemaRegistrySchemaRegistry SchemaRegistry
Kafka
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Avro ser/des protocol
 Multiple ser/des can be registered with respective protocol versions
 Default ser/des send protocol/schema versions as part of the binary payload of kafka
messages
– This can be enhanced once there is headers/metadata support for kafka messages
– Custom ser/des can be registered for schemas.
Default ser/des message format
<protocol-id><identification-info-for-schema-version><message-payload>
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nifi integration
 Nifi Controller Service
 Nifi processors
– Transforms
• Avro – CSV
• Avro – Json
• Json – CSV
– Extracting Avro fields
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security
 Kerberos support
 Enabled with SPNEGO based filter
 Verified integration with
– Streaming Analytics Manager
– Storm
– NiFi
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry UI
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Roadmap
 Collaboration
– Notifications
– Schema life cycle management
– Audit log
– Improved UI
 Rich data types
 Operations
– Cross-cluster mirroring
 Security
– SSL and OAuth 2.0
– Schema and sub schema level
authorization
– Ranger support
 Integration
– Multi lang client
– Pluggable listeners
– Converters
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Docs
– http://registry-project.readthedocs.io/en/latest/index.html
 Repo
– https://github.com/hortonworks/registry
 Google groups
– https://groups.google.com/forum/#!forum/registry
 Open sourced under Apache License
 Apache incubation soon
 Contributions are welcome
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q&A
https://github.com/hortonworks/registry

Schema Registry - Set Your Data Free

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry Sriharsha Chintalapani, Hortonworks Satish Duggana, Hortonworks Dataworks summit, 2017, San Jose
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda  Introduction  Concepts  Architecture  Integration  Security  Roadmap
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved What is Schema Registry? What Value Does it Provide?  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other  What Value does Schema Registry Provide? – Data Governance • Provide reusable schema • Define relationship between schemas • Enable generic format conversion, and generic routing – Operational Efficiency • To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates  Example Use – Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry Concepts • Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas • Schema Metadata Metadata associated with a named schema. • Schema Version The actual versioned schema associated a schema meta definition Schema Metadata 1 Schema Name Schema Type Description Compatibility Policy Serializers Deserializers Schema Group Group Name SchemaVersion 3 SchemaVersion 2 Schema Version 1 version text Fingerprint
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST APISchema Registry Client Java Client Integrations Nifi Processors Kafka Ser/Des StreamLine Schema Storage Pluggable Storage Serializer/Deserializer Jar Storage MySQL In-Memory Local File System HDFSPostgres
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Sender/Receiver flow Local schema/serdes cache Serializer Sender Schema Registry Client Message Store Local schema/serdes cache Deserializer Schema Registry Client version payload version payload Schema Storage SerDes Storage Receiver SchemaRegistrySchemaRegistry SchemaRegistry
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Writer/Reader schemas  Writer schema – Senders/Producers use this schema while sending the payloads according to the given schema viz writer’s schema  Reader/Projection schema – Receivers uses this schema to project the received payload written with a writer schema. Sender Receiver Writer Schema Writer Schema Projection Schema
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema evolution Producer v2 Consumer v2 Producer v1 Producer v4 Consumer v5 Producer v1 Consumer v7
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Compatibility Policies  What is a Compatibility Policy? – Defines the rules of how the schemas can evolve – Subsequent version updates has to honor the schema’s original compatibility.  Policies Supported – Backward – Forward – Both – None
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Backward compatibility  New version of a schema would be compatible with earlier version of that schema.  Data written from earlier version of the schema, can be read with a new version of the schema. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 } ] }
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Forward compatibility  Existing schema is compatible with future versions of the schema.  That means the data written from new version of the schema can still be read with old version of the schema. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int" } ] }
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Both/Full compatibility  New version of the schema provides both backward and forward compatibilities. V1 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ] } V2 { "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 }, { "name": "title", "type" : "string", "default": "" } ] }
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema composition  Schemas can be shared and reused  Inbuilt support in default serializer/deserializer to build effective schemas { "name": "account", "namespace": "com.hortonworks.example.types", "includeSchemas": [ { "name": "utils” } ], "type": "record", "fields": [ { "name": "name", "type": "string" }, { "name": "id", "type": "com.hortonworks.datatypes.uuid" } ] } { "name": "uuid", "type": "record", "namespace": "com.hortonworks.datatypes", "doc": "A Universally Unique Identifier, in canonical form in lowercase. This is generated from java.util.UUID Example: de305d54-75b4-431b-adb2-eb6b9e546014", "fields": [ { "name": "value", "type": "string", "default": "" } ] }
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Serializers/Deserializers  Snapshot based serializer/deserializer – Seriliazes the complete payload – Deserializes the payload to respective type  Pull based serializer/deserializer – Serialize whatever elements are required and ignore other elements – Pull out whatever elements that are required to build the desired object  Push based deserializer – Gives callback to receive parsing events for respective fields in schema
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema registry client  REST based client  Caching – Metadata – Schema versions – Ser/des libs and class loaders  URL selectors – Round robin – Failover
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved HA  Storage provider – Depends on transactional support of underlying SQL stores – Spinup required schema registry instances  Supports HA at SchemaRegistry – Using ZK/Curator – Automatic failover of master – Master gets all writes – Slaves receives only reads SchemaRegistry storage SchemaRegistrySchemaRegistry SchemaRegistry SchemaRegistry SchemaRegistry storage
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Integration of Schema Registry  Kafka – Using producer/consumer API for serializer/deserializer  Nifi Processors for Schema Registry – Fetch Schema – Serialize/Deserialize with Schema  StreamLine processors for Schema Registry – Lookup Schema of a Kafka Topic
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Kafka integration Local schema/serdes cache KafkaAvro Serializer Sender Schema Registry Client Local schema/serdes cache KafkaAvro Deserializer Schema Registry Client version payload version payload Receiver SchemaRegistrySchemaRegistry SchemaRegistry Kafka
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Kafka Avro ser/des protocol  Multiple ser/des can be registered with respective protocol versions  Default ser/des send protocol/schema versions as part of the binary payload of kafka messages – This can be enhanced once there is headers/metadata support for kafka messages – Custom ser/des can be registered for schemas. Default ser/des message format <protocol-id><identification-info-for-schema-version><message-payload>
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Nifi integration  Nifi Controller Service  Nifi processors – Transforms • Avro – CSV • Avro – Json • Json – CSV – Extracting Avro fields
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved Security  Kerberos support  Enabled with SPNEGO based filter  Verified integration with – Streaming Analytics Manager – Storm – NiFi
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved Schema Registry UI
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved Roadmap  Collaboration – Notifications – Schema life cycle management – Audit log – Improved UI  Rich data types  Operations – Cross-cluster mirroring  Security – SSL and OAuth 2.0 – Schema and sub schema level authorization – Ranger support  Integration – Multi lang client – Pluggable listeners – Converters
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved Try it out!  Docs – http://registry-project.readthedocs.io/en/latest/index.html  Repo – https://github.com/hortonworks/registry  Google groups – https://groups.google.com/forum/#!forum/registry  Open sourced under Apache License  Apache incubation soon  Contributions are welcome
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved Q&A https://github.com/hortonworks/registry

Editor's Notes

  • #11 Consumers can update their schemas without effecting the producers Add new fields with default values Drop existing fields
  • #12 Existing producers can update their schemas without effecting the consumers. Add new fields with default values Drop fields only with default values.
  • #13 Full requires Add fields with default values