SlideShare a Scribd company logo
1 of 33
Download to read offline
The Polylog
A Log-Based Architecture for
Distributed Systems
JW Player
Motivation
Inspiration
Implementation
Use cases
1
2
3
4
5
Agenda
JW Player
JW Player
1. Established - 2008
2. Headquarters - NYC (2 Park Ave)
3. Employees - 200+
4. Business Model - SaaS
5. JW Player Footprint: 5%+ of all
video on the web
Data @ JW Player
1. 1Bn video hours consumed per month
2. 1Bn unique viewers per month
3. 5MM analytics events per minute
4. 3TB of logs per day
Pipelines
Ingestion, pipelines,
Infrastructure
Discovery
Recs & Search in
Production
Insights
Customer
dashboards
Media
Intelligence
Media metadata
extraction
Data Science
R&D, instrumentation,
predictive modeling
Recommendations and Search
Motivation
JW Player is breaking up its monolith
1. JW Player is moving to a Service Oriented
Architecture (SOA)
2. SOA promotes loose coupling between services
3. Part of the roadmap is to break up our monolithic
database into separate datastores for faster
iteration
Some services don’t work under SOA
1. Our data services depend on syncing
Elasticsearch with numerous tables from the
monolith
2. Traditional API-style architecture doesn’t work
for indexing data across many sources and data
change monitoring:
a. Hard to know when, how and what changed
b. Hard to maintain consistency
c. Hard to scan the entire dataset
We need the ability to perform
both iterative updates and full
rebuilds of recommendations
simply and efficiently
Our Mission
Inspiration
The Monolog
1. New York Times solved this
problem with log-based
architecture
2. CMSs write to Kafka first, from
which other services read and build
3. “Mono” because everything written
to single Kafka topic and partition
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
The simplicity of logs
Simplest possible storage abstraction. It is an append-only,
totally-ordered sequence of records ordered by time.
1. Distributed and fault tolerant
2. Stores full history
3. Can replay from beginning
4. Supports log compaction
5. Clients in many languages:
JVM, Python, Go
Apache Kafka: distributed logs
# hello world in Kafka
import confluent_kafka
consumer = confluent_kafka.Consumer({
"bootstrap.servers": "my-kafka:9092",
"group.id": "my_consumer",
})
consumer.subscribe(["my_topic"])
while True:
message = consumer.poll()
process_message(message)
Implementation
The Polylog
1. Fewer assumptions than Monolog
2. Can be multiple topics, partitions or clusters
3. Easier to scale
4. Ability to create consistent view of denormalized
data
Polylog components
1. Producers - populating The Polylog
a. Debezium
b. Custom
2. Storage - Kafka
3. Intermediate processors
a. Denormalizer
b. Custom
4. Consumers - consuming off of The Polylog
The Polylog
Debezium: read logs from the database
1. Reads op logs from various databases
(MySQL, Postgres, Mongo, etc.) and writes
to Kafka
2. Minimal setup
3. Every table is a topic
4. Handles schema changes
5. Configuration options (e.g. table whitelist,
column blacklist)
1. Debezium is not appropriate for all use cases
2. We have custom producers writing to Polylog
a. Derived data (E.g. algorithm results)
b. Producers requiring business logic
c. Kafka as source of truth
Custom Producers
Denormalizer: left joins on streams
1. Join records across multiple topics
2. Create full denormalized records (e.g. media with tags)
3. Generic schema
4. RocksDB with AWS S3 backup
5. Looking to open source
Denormalizer: what does the data look like?
{
"id": 123,
"title": "My title",
"duration":600
}
{
"PrimaryKey": "0360",
"Record": {
"id": 123,
"title": "My title",
"duration": 600
},
"Children": {
"table2": [{
"PrimaryKey": "0203",
"Record": {
"id": 234,
"table1_id": 123,
"val": "hello world!"
},
"Children": {...}
}
}
mysql.mydb.table1
my_denormalizer_topic
{
"id": 234,
"table1_id": 123,
"val": "hello world"
}
mysql.mydb.table2
Consumers: stream to other datastores
1. Read denormalized records
2. Transform into expected format
3. Write transformed records into another
datastore (e.g. Elasticsearch)
Use cases
1. Build data models from disparate data
sources
2. Kafka as primary source of truth
a. Write to Kafka first
b. Can have multiple consumers
c. At least once guarantee
d. Guarantee consistency - Avoid dual write issue
3. Database migrations
a. Avoid dual write issues!
b. Stand up new service while old service still active
c. Seamless switch - no hard cutover
4. Data change monitoring
5. Disaster recovery and fault tolerance
a. Kafka retention means we have an audit trail
b. Examples:
➢ Accidentally overwriting data in upstream
database
➢ Debugging how data changed over time
a. “Don’t be a salmon!” - don’t talk directly to upstream
services
b. Polylog is a single data source that multiple consumers
can work off of
c. When you need a service that can’t do basic API calls
6. New services based on other service's
datasets
Conclusion
Use log-based architectures!
1. Build data models from disparate data sources
2. Kafka as primary source of truth
3. Database migrations for SOA
4. Data change monitoring
5. Disaster recovery and fault tolerance
6. Building new services based on other service's full
datasets
Thank you... and we’re hiring!
Questions?

More Related Content

What's hot

Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshareAllen (Xiaozhong) Wang
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Expanding Asterisk with Kamailio
Expanding Asterisk with KamailioExpanding Asterisk with Kamailio
Expanding Asterisk with KamailioFred Posner
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSASrikrupa Srivatsan
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...HostedbyConfluent
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggleconfluent
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumChengKuan Gan
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyVMware Tanzu
 

What's hot (20)

Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Expanding Asterisk with Kamailio
Expanding Asterisk with KamailioExpanding Asterisk with Kamailio
Expanding Asterisk with Kamailio
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSA
 
kafka
kafkakafka
kafka
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Similar to Polylog: A Log-Based Architecture for Distributed Systems

Polylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed SystemsPolylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed SystemsKamil Sindi
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Yaroslav Tkachenko
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
Accesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformAccesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformLuca Di Fino
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseSandesh Rao
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Scale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabricScale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabricWim Van den Broeck
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformMatteo Merli
 

Similar to Polylog: A Log-Based Architecture for Distributed Systems (20)

Polylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed SystemsPolylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed Systems
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Data Science
Data ScienceData Science
Data Science
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
 
Accesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformAccesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data Platform
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Scale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabricScale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabric
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 

Recently uploaded

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Internet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptxInternet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptxErYashwantJagtap
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 

Recently uploaded (17)

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Internet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptxInternet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptx
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 

Polylog: A Log-Based Architecture for Distributed Systems

  • 1. The Polylog A Log-Based Architecture for Distributed Systems
  • 4. JW Player 1. Established - 2008 2. Headquarters - NYC (2 Park Ave) 3. Employees - 200+ 4. Business Model - SaaS 5. JW Player Footprint: 5%+ of all video on the web
  • 5. Data @ JW Player 1. 1Bn video hours consumed per month 2. 1Bn unique viewers per month 3. 5MM analytics events per minute 4. 3TB of logs per day Pipelines Ingestion, pipelines, Infrastructure Discovery Recs & Search in Production Insights Customer dashboards Media Intelligence Media metadata extraction Data Science R&D, instrumentation, predictive modeling
  • 8. JW Player is breaking up its monolith 1. JW Player is moving to a Service Oriented Architecture (SOA) 2. SOA promotes loose coupling between services 3. Part of the roadmap is to break up our monolithic database into separate datastores for faster iteration
  • 9. Some services don’t work under SOA 1. Our data services depend on syncing Elasticsearch with numerous tables from the monolith 2. Traditional API-style architecture doesn’t work for indexing data across many sources and data change monitoring: a. Hard to know when, how and what changed b. Hard to maintain consistency c. Hard to scan the entire dataset
  • 10. We need the ability to perform both iterative updates and full rebuilds of recommendations simply and efficiently Our Mission
  • 12. The Monolog 1. New York Times solved this problem with log-based architecture 2. CMSs write to Kafka first, from which other services read and build 3. “Mono” because everything written to single Kafka topic and partition https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
  • 13. The simplicity of logs Simplest possible storage abstraction. It is an append-only, totally-ordered sequence of records ordered by time.
  • 14. 1. Distributed and fault tolerant 2. Stores full history 3. Can replay from beginning 4. Supports log compaction 5. Clients in many languages: JVM, Python, Go Apache Kafka: distributed logs # hello world in Kafka import confluent_kafka consumer = confluent_kafka.Consumer({ "bootstrap.servers": "my-kafka:9092", "group.id": "my_consumer", }) consumer.subscribe(["my_topic"]) while True: message = consumer.poll() process_message(message)
  • 16. The Polylog 1. Fewer assumptions than Monolog 2. Can be multiple topics, partitions or clusters 3. Easier to scale 4. Ability to create consistent view of denormalized data
  • 17. Polylog components 1. Producers - populating The Polylog a. Debezium b. Custom 2. Storage - Kafka 3. Intermediate processors a. Denormalizer b. Custom 4. Consumers - consuming off of The Polylog
  • 19. Debezium: read logs from the database 1. Reads op logs from various databases (MySQL, Postgres, Mongo, etc.) and writes to Kafka 2. Minimal setup 3. Every table is a topic 4. Handles schema changes 5. Configuration options (e.g. table whitelist, column blacklist)
  • 20. 1. Debezium is not appropriate for all use cases 2. We have custom producers writing to Polylog a. Derived data (E.g. algorithm results) b. Producers requiring business logic c. Kafka as source of truth Custom Producers
  • 21. Denormalizer: left joins on streams 1. Join records across multiple topics 2. Create full denormalized records (e.g. media with tags) 3. Generic schema 4. RocksDB with AWS S3 backup 5. Looking to open source
  • 22. Denormalizer: what does the data look like? { "id": 123, "title": "My title", "duration":600 } { "PrimaryKey": "0360", "Record": { "id": 123, "title": "My title", "duration": 600 }, "Children": { "table2": [{ "PrimaryKey": "0203", "Record": { "id": 234, "table1_id": 123, "val": "hello world!" }, "Children": {...} } } mysql.mydb.table1 my_denormalizer_topic { "id": 234, "table1_id": 123, "val": "hello world" } mysql.mydb.table2
  • 23. Consumers: stream to other datastores 1. Read denormalized records 2. Transform into expected format 3. Write transformed records into another datastore (e.g. Elasticsearch)
  • 25. 1. Build data models from disparate data sources
  • 26. 2. Kafka as primary source of truth a. Write to Kafka first b. Can have multiple consumers c. At least once guarantee d. Guarantee consistency - Avoid dual write issue
  • 27. 3. Database migrations a. Avoid dual write issues! b. Stand up new service while old service still active c. Seamless switch - no hard cutover
  • 28. 4. Data change monitoring
  • 29. 5. Disaster recovery and fault tolerance a. Kafka retention means we have an audit trail b. Examples: ➢ Accidentally overwriting data in upstream database ➢ Debugging how data changed over time
  • 30. a. “Don’t be a salmon!” - don’t talk directly to upstream services b. Polylog is a single data source that multiple consumers can work off of c. When you need a service that can’t do basic API calls 6. New services based on other service's datasets
  • 32. Use log-based architectures! 1. Build data models from disparate data sources 2. Kafka as primary source of truth 3. Database migrations for SOA 4. Data change monitoring 5. Disaster recovery and fault tolerance 6. Building new services based on other service's full datasets
  • 33. Thank you... and we’re hiring! Questions?