SlideShare a Scribd company logo
1 of 16
Download to read offline
Schema Registries
Mike Robins | Co-founder
linkedin.com/in/mikerobins
Snowplow Philosophy
- Open source (ALv2) or managed (paid)
- Batch or real time
- Collect everything (web, mobile, IoT, webhooks)
- Ownership of data matters
- Data modelling should be first class and flexible
- BYO toolset (Spark, Drill, Beam etc)
● Imagine all employees are required to speak only in their native language.
● Either everyone has to be multilingual, or expensive translators must be added for every
pair of languages spoken.
○ Even if you have a sophisticated and efficient way of getting messages from place
to place, you’re still stuck with the overhead of constant translation.
Hazards of many languages
● A shared contract between a consumer and a producer
● Prior art
○ Avro, Thrift, Protobuf etc
A schema
Key attributes of schema technologies
● Code generation – for bindings to your schemas in a
given programming language
● Data encodings
● Validation rules - for calibration and sanity
● Types – a description of the type of data
● Schema evolution
Copyright Frank Drake, NASA (1977) License: CC BY-NC-ND 2.0
Copyright Frank Drake, NASA (1977) License: CC BY-NC-ND 2.0
iglu:com.<myco>/<event>/jsonschema/1-0-0
{
"$schema":
"http://iglucentral.com/schemas/com.snowplowanalytics.s
elf-desc/schema/jsonschema/1-0-0#",
"description": "Schema for <…>",
"self": {
"vendor": "<com.myco>",
"name": "<event>",
"format": "jsonschema",
"version": "1-0-0"
},
"type": "object",
"properties": {
"action_context": {
"type": "string"
}
...
},
"required": ["subject", "event_id"],
"additionalProperties": false
}
The schema URI is IGLU
The name of this schema
The vendor of this schema
Schema format
Schema version
Schema storage
● Option 1: Send the entire definition with the record
Record Record Record Record
Schema Schema Schema Schema
Schema storage
● Option 2: Send a pointer to the definition
*Schema *Schema *Schema *Schema
Record Record Record Record
Schema storage
● A canonical, shared source of truth
● Within and between organisations
Schema registry
● Data governance
○ Safe schema evolution
○ Policy enforcement
● Data pipeline resilience
● Data discovery
● Efficiency
○ Cost
○ Storage
○ Computation
● Shares principles with software engineering CI/CD
Why?
Key takeaways
Schemas are critical and a shared repository of all schemas used by the
organisation is important to make siloed knowledge shared and explicit.
By using schemas, the data definition for a particular kind of data exists in a single
place.
Schemas serve as self-contained and automatically enforceable contracts
between producers and consumers of data.
Demo
mike@snowflake-analytics.com
Snowplow (github.com/snowplow/snowplow)
Schemas (Iglu Central)
Kinesis (Amazon Web Services)
Pusher (pusher.com)

More Related Content

Similar to Schema Registries Explained

Implementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax HubImplementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax HubKevin Hakanson
 
Philip Stehlik at TechTalks.ph - Intro to Groovy and Grails
Philip Stehlik at TechTalks.ph - Intro to Groovy and GrailsPhilip Stehlik at TechTalks.ph - Intro to Groovy and Grails
Philip Stehlik at TechTalks.ph - Intro to Groovy and GrailsPhilip Stehlik
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryMongoDB
 
A intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsA intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsDaniel Koller
 
OrientDB the database for the web 1.1
OrientDB the database for the web 1.1OrientDB the database for the web 1.1
OrientDB the database for the web 1.1Luca Garulli
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)Eran Duchan
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017iguazio
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
 
Andriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tipsAndriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tipsOWASP Kyiv
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]LivePerson
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Future-proof Development for Classic SharePoint
Future-proof Development for Classic SharePointFuture-proof Development for Classic SharePoint
Future-proof Development for Classic SharePointBob German
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
 

Similar to Schema Registries Explained (20)

Implementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax HubImplementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax Hub
 
Philip Stehlik at TechTalks.ph - Intro to Groovy and Grails
Philip Stehlik at TechTalks.ph - Intro to Groovy and GrailsPhilip Stehlik at TechTalks.ph - Intro to Groovy and Grails
Philip Stehlik at TechTalks.ph - Intro to Groovy and Grails
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content Repository
 
A intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsA intro to (hosted) Shiny Apps
A intro to (hosted) Shiny Apps
 
OrientDB the database for the web 1.1
OrientDB the database for the web 1.1OrientDB the database for the web 1.1
OrientDB the database for the web 1.1
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 
Andriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tipsAndriy Shalaenko - GO security tips
Andriy Shalaenko - GO security tips
 
Node.js an Exectutive View
Node.js an Exectutive ViewNode.js an Exectutive View
Node.js an Exectutive View
 
Secure Code Review 101
Secure Code Review 101Secure Code Review 101
Secure Code Review 101
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Future-proof Development for Classic SharePoint
Future-proof Development for Classic SharePointFuture-proof Development for Classic SharePoint
Future-proof Development for Classic SharePoint
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Schema Registries Explained

  • 1. Schema Registries Mike Robins | Co-founder linkedin.com/in/mikerobins
  • 2. Snowplow Philosophy - Open source (ALv2) or managed (paid) - Batch or real time - Collect everything (web, mobile, IoT, webhooks) - Ownership of data matters - Data modelling should be first class and flexible - BYO toolset (Spark, Drill, Beam etc)
  • 3. ● Imagine all employees are required to speak only in their native language. ● Either everyone has to be multilingual, or expensive translators must be added for every pair of languages spoken. ○ Even if you have a sophisticated and efficient way of getting messages from place to place, you’re still stuck with the overhead of constant translation. Hazards of many languages
  • 4.
  • 5. ● A shared contract between a consumer and a producer ● Prior art ○ Avro, Thrift, Protobuf etc A schema
  • 6. Key attributes of schema technologies ● Code generation – for bindings to your schemas in a given programming language ● Data encodings ● Validation rules - for calibration and sanity ● Types – a description of the type of data ● Schema evolution
  • 7.
  • 8. Copyright Frank Drake, NASA (1977) License: CC BY-NC-ND 2.0
  • 9. Copyright Frank Drake, NASA (1977) License: CC BY-NC-ND 2.0
  • 10. iglu:com.<myco>/<event>/jsonschema/1-0-0 { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.s elf-desc/schema/jsonschema/1-0-0#", "description": "Schema for <…>", "self": { "vendor": "<com.myco>", "name": "<event>", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "action_context": { "type": "string" } ... }, "required": ["subject", "event_id"], "additionalProperties": false } The schema URI is IGLU The name of this schema The vendor of this schema Schema format Schema version
  • 11. Schema storage ● Option 1: Send the entire definition with the record Record Record Record Record Schema Schema Schema Schema
  • 12. Schema storage ● Option 2: Send a pointer to the definition *Schema *Schema *Schema *Schema Record Record Record Record Schema storage
  • 13. ● A canonical, shared source of truth ● Within and between organisations Schema registry
  • 14. ● Data governance ○ Safe schema evolution ○ Policy enforcement ● Data pipeline resilience ● Data discovery ● Efficiency ○ Cost ○ Storage ○ Computation ● Shares principles with software engineering CI/CD Why?
  • 15. Key takeaways Schemas are critical and a shared repository of all schemas used by the organisation is important to make siloed knowledge shared and explicit. By using schemas, the data definition for a particular kind of data exists in a single place. Schemas serve as self-contained and automatically enforceable contracts between producers and consumers of data.
  • 16. Demo mike@snowflake-analytics.com Snowplow (github.com/snowplow/snowplow) Schemas (Iglu Central) Kinesis (Amazon Web Services) Pusher (pusher.com)