SlideShare a Scribd company logo
1 of 35
Schemas Beyond The Edge
Alexei Zenin
Platform Engineer, Uken Games
August 17th, 2021
Journey
Mobile Analytics at Uken Games
Key Features of Schema Registry
Revamped Mobile Analytics Pipeline: JSON to Protobuf
2
Mobile Analytics at Uken Games
Collection of data to drive:
● Retention & engagement analysis (AB tests, DAU)
● Operational debugging & investigation
(purchases, rewards)
● Lower-level metric analysis (HTTP, CPU, Memory)
Focus will be on the first two
3
Current Ingestion Pipeline Design
4
~200 million events per day
Schema Silos
5
● Each silo has its own processes and
tools
● Several teams managing the same data
asset definitions
● Duplication of work across the
boundaries
○ Structure
○ Data types
○ Naming
6
Schema Drift
Pros & Cons
Pros
● Adding a new event is quick
● JSON is easy to use
● JSON well supported across systems
Cons
● Duplicated effort across teams for data management
● Repeated schema definitions
● Manual processes to keep things in sync
● JSON retransmits schema information each time
7
Schema Registry Refresher
8
Overview of Schema Registry
● Provides a central place to upload your
schemas
● Immutable & Idempotent API
● Acts as “barcode system”
9
https://bit.ly/3ikXUci
Default Wire Format (Confluent)
10
Single Primary Architecture
11
● Primary is elected via
Kafka Group Protocol
● Each Secondary is informed of
the primary’s address
● Primary is responsible for writes
(e.g. registering new schemas)
● Every node can serve reads or
forward write requests
Ingestion Pipeline 2.0: Protobuf
12
Ingestion Pipeline 2.0
Goals:
● Decrease boilerplate work
● Leverage automation
● Increase transparency
● Single source of truth for schemas
Solution:
Gitops paradigm with Protobuf & Schema Registry
13
JSON to Protobuf: Schema Structures
● Legacy JSON schema structure use the concept of an envelope with various subdivisions
● Convert into Protobuf, preserve schema structure
14
Protobuf Equivalent
Protobuf features:
● Can express custom types and use
composition to glue together
schemas in 1 top level schema
● Has support for some native types
like google.protobuf.Timestamp
● Ability to generate code from schema
(e.g. Java, C#)
15
Envelope per payload?
● Uken has between 200-300 event
payloads per game
● One approach is to copy paste the
envelope per payload per game
Envelopes = Payloads X Games
● Downsides are duplication in schemas
and generated code
● Leads to a poor developer experience
16
“Oneof” Branching
● Try using the oneof construct to
enumerate all possible payloads at
the EventPayload level
● Only need an envelope per game or 1
mega envelope for all games
Envelopes = O(Games)
● Similar to Avro Unions
● Still has duplication of envelope and
EventPayload
17
How to get around strict Protobuf schemas
● Each definition needs to be defined upfront and explicitly in Protobuf
Solution:
Defer schema attached until runtime to be able to reuse envelope across games
18
Protobuf’s Any: Dynamic Message Container
● Allows you to use any embedded
Protobuf type
● Uses special “packing” code
● The type_url only accepts Protobuf
package names
● Requires class to be present in
application
19
https://bit.ly/2Uo6RcS
Schema Registry Compatible “Any”
● Enables attaching schemas at runtime
● Removes type_url for schema_id
● The value field is Proto3 encoded bytes
20
Dynamic Envelope
● Can define one envelope for
all games
● Use the AnyUken type for
EventPayload
● Tradeoff explicitness for
flexibility
● Elevate schema IDs to a first
class concept within the
schemas themselves
(schema pointers)
21
22
Protobuf View
How do we get these schemas
past the edge?
23
Integrating Schema Registry with Mobile clients
Problems:
● Generated Protobuf classes are not aware of schema IDs
● Client device needs access to schema IDs
24
“Online” Approach (traditional)
25
● Expose Schema Registry (SR) to
mobile clients directly
● Fetch required schema IDs
during app runtime
Disadvantages:
● Need to setup security for
schema registry
● Need to scale SR to millions of
clients
● Point of failure for client
● Adds network overhead to
client
“Embedded” Approach
26
Embedded Tradeoffs
● No need to expose SR to millions of clients
● Leverage the immutable property of schema
IDs
● Custom build process
● Total snapshot of schema IDs built into app
27
Serverless + Schema Registry
28
EventBatch wire format
29
● Define a generic envelope as the API contract
between API Gateway & Mobile client
● Able to take any resolvable schema, return
error if bad data
● Avoids hardcoding a specific version of
AnalyticsEvent
● Keeps wire format compliant to Proto3
Nesting Schema IDs Visualized
30
Performance Comparison
31
● 3x improvement in latency to ingest a
batch of events
● 2x reduction in size per event
● 2x increase in possible number of
events buffered on device
JSON Latency
Protobuf Latency
Schema Management: GitOps paradigm
● Version control Protobuf schemas in Git monorepo
● Use Merge Requests to collaborate on impending changes
● Run CI/CD on commits
● Place people into the right spots, let automation do the rest
32
33
34
Next Steps
● Adding a Data Dictionary
● Improving visibility with integrations to Slack
● Iterating on RACI matrix for data management
● Migrating Spark Jobs to utilize new data management process
35
Takeaways
● GitOps allows for data management
automation
● Schema Registry can empower devices outside
the data center
● Schema IDs allow flexible envelope designs that
operate better at scale
● Protobuf can help reduce costs by several times

More Related Content

What's hot

Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafkaconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Productionconfluent
 
Intro to Asynchronous Javascript
Intro to Asynchronous JavascriptIntro to Asynchronous Javascript
Intro to Asynchronous JavascriptGarrett Welson
 
Running distributed tests with k6.pdf
Running distributed tests with k6.pdfRunning distributed tests with k6.pdf
Running distributed tests with k6.pdfLibbySchulze
 
Microservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaMicroservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaAraf Karsh Hamid
 
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A GrzesikApache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesikmfrancis
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
 
API Security in a Microservice Architecture
API Security in a Microservice ArchitectureAPI Security in a Microservice Architecture
API Security in a Microservice ArchitectureMatt McLarty
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafkaconfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Microservices & API Gateways
Microservices & API Gateways Microservices & API Gateways
Microservices & API Gateways Kong Inc.
 

What's hot (20)

Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
kafka
kafkakafka
kafka
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
Intro to Asynchronous Javascript
Intro to Asynchronous JavascriptIntro to Asynchronous Javascript
Intro to Asynchronous Javascript
 
Running distributed tests with k6.pdf
Running distributed tests with k6.pdfRunning distributed tests with k6.pdf
Running distributed tests with k6.pdf
 
Microservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaMicroservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and Kafka
 
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A GrzesikApache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
API Security in a Microservice Architecture
API Security in a Microservice ArchitectureAPI Security in a Microservice Architecture
API Security in a Microservice Architecture
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Microservices & API Gateways
Microservices & API Gateways Microservices & API Gateways
Microservices & API Gateways
 

Similar to Schemas Beyond The Edge

Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Building Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudBuilding Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudChris Schalk
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
Node.js Presentation
Node.js PresentationNode.js Presentation
Node.js PresentationExist
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in SmalltalkESUG
 
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...eMadrid network
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentDavid Galeano
 
Mobile game architecture on GCP
Mobile game architecture on GCPMobile game architecture on GCP
Mobile game architecture on GCP명근 최
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...David Geurts
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 

Similar to Schemas Beyond The Edge (20)

Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
ARISE
ARISEARISE
ARISE
 
Building Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudBuilding Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the Cloud
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Node.js Presentation
Node.js PresentationNode.js Presentation
Node.js Presentation
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in Smalltalk
 
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
Mobile game architecture on GCP
Mobile game architecture on GCPMobile game architecture on GCP
Mobile game architecture on GCP
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
COM+ & MSMQ
COM+ & MSMQCOM+ & MSMQ
COM+ & MSMQ
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologies
 
resume
resumeresume
resume
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Rashmi_Resume
Rashmi_ResumeRashmi_Resume
Rashmi_Resume
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

Schemas Beyond The Edge

  • 1. Schemas Beyond The Edge Alexei Zenin Platform Engineer, Uken Games August 17th, 2021
  • 2. Journey Mobile Analytics at Uken Games Key Features of Schema Registry Revamped Mobile Analytics Pipeline: JSON to Protobuf 2
  • 3. Mobile Analytics at Uken Games Collection of data to drive: ● Retention & engagement analysis (AB tests, DAU) ● Operational debugging & investigation (purchases, rewards) ● Lower-level metric analysis (HTTP, CPU, Memory) Focus will be on the first two 3
  • 4. Current Ingestion Pipeline Design 4 ~200 million events per day
  • 5. Schema Silos 5 ● Each silo has its own processes and tools ● Several teams managing the same data asset definitions ● Duplication of work across the boundaries ○ Structure ○ Data types ○ Naming
  • 7. Pros & Cons Pros ● Adding a new event is quick ● JSON is easy to use ● JSON well supported across systems Cons ● Duplicated effort across teams for data management ● Repeated schema definitions ● Manual processes to keep things in sync ● JSON retransmits schema information each time 7
  • 9. Overview of Schema Registry ● Provides a central place to upload your schemas ● Immutable & Idempotent API ● Acts as “barcode system” 9 https://bit.ly/3ikXUci
  • 10. Default Wire Format (Confluent) 10
  • 11. Single Primary Architecture 11 ● Primary is elected via Kafka Group Protocol ● Each Secondary is informed of the primary’s address ● Primary is responsible for writes (e.g. registering new schemas) ● Every node can serve reads or forward write requests
  • 12. Ingestion Pipeline 2.0: Protobuf 12
  • 13. Ingestion Pipeline 2.0 Goals: ● Decrease boilerplate work ● Leverage automation ● Increase transparency ● Single source of truth for schemas Solution: Gitops paradigm with Protobuf & Schema Registry 13
  • 14. JSON to Protobuf: Schema Structures ● Legacy JSON schema structure use the concept of an envelope with various subdivisions ● Convert into Protobuf, preserve schema structure 14
  • 15. Protobuf Equivalent Protobuf features: ● Can express custom types and use composition to glue together schemas in 1 top level schema ● Has support for some native types like google.protobuf.Timestamp ● Ability to generate code from schema (e.g. Java, C#) 15
  • 16. Envelope per payload? ● Uken has between 200-300 event payloads per game ● One approach is to copy paste the envelope per payload per game Envelopes = Payloads X Games ● Downsides are duplication in schemas and generated code ● Leads to a poor developer experience 16
  • 17. “Oneof” Branching ● Try using the oneof construct to enumerate all possible payloads at the EventPayload level ● Only need an envelope per game or 1 mega envelope for all games Envelopes = O(Games) ● Similar to Avro Unions ● Still has duplication of envelope and EventPayload 17
  • 18. How to get around strict Protobuf schemas ● Each definition needs to be defined upfront and explicitly in Protobuf Solution: Defer schema attached until runtime to be able to reuse envelope across games 18
  • 19. Protobuf’s Any: Dynamic Message Container ● Allows you to use any embedded Protobuf type ● Uses special “packing” code ● The type_url only accepts Protobuf package names ● Requires class to be present in application 19 https://bit.ly/2Uo6RcS
  • 20. Schema Registry Compatible “Any” ● Enables attaching schemas at runtime ● Removes type_url for schema_id ● The value field is Proto3 encoded bytes 20
  • 21. Dynamic Envelope ● Can define one envelope for all games ● Use the AnyUken type for EventPayload ● Tradeoff explicitness for flexibility ● Elevate schema IDs to a first class concept within the schemas themselves (schema pointers) 21
  • 23. How do we get these schemas past the edge? 23
  • 24. Integrating Schema Registry with Mobile clients Problems: ● Generated Protobuf classes are not aware of schema IDs ● Client device needs access to schema IDs 24
  • 25. “Online” Approach (traditional) 25 ● Expose Schema Registry (SR) to mobile clients directly ● Fetch required schema IDs during app runtime Disadvantages: ● Need to setup security for schema registry ● Need to scale SR to millions of clients ● Point of failure for client ● Adds network overhead to client
  • 27. Embedded Tradeoffs ● No need to expose SR to millions of clients ● Leverage the immutable property of schema IDs ● Custom build process ● Total snapshot of schema IDs built into app 27
  • 28. Serverless + Schema Registry 28
  • 29. EventBatch wire format 29 ● Define a generic envelope as the API contract between API Gateway & Mobile client ● Able to take any resolvable schema, return error if bad data ● Avoids hardcoding a specific version of AnalyticsEvent ● Keeps wire format compliant to Proto3
  • 30. Nesting Schema IDs Visualized 30
  • 31. Performance Comparison 31 ● 3x improvement in latency to ingest a batch of events ● 2x reduction in size per event ● 2x increase in possible number of events buffered on device JSON Latency Protobuf Latency
  • 32. Schema Management: GitOps paradigm ● Version control Protobuf schemas in Git monorepo ● Use Merge Requests to collaborate on impending changes ● Run CI/CD on commits ● Place people into the right spots, let automation do the rest 32
  • 33. 33
  • 34. 34 Next Steps ● Adding a Data Dictionary ● Improving visibility with integrations to Slack ● Iterating on RACI matrix for data management ● Migrating Spark Jobs to utilize new data management process
  • 35. 35 Takeaways ● GitOps allows for data management automation ● Schema Registry can empower devices outside the data center ● Schema IDs allow flexible envelope designs that operate better at scale ● Protobuf can help reduce costs by several times