SlideShare a Scribd company logo
1 of 33
Adventures in AdTech
(processing 50B user profiles in real time)
Lubos Kosco, Michal Senkyr
Senior Software Engineers at Sizmek
INTRODUCTION
Presenters Bio
Lubos is working as a software engineer in Sizmek taking care of 7 datacenters
full of bidding infrastructure to deliver real time bids and impressions for
marketing campaigns of Sizmeks clients. Previously he used to work in Oracle
and Sun Microsystems on products that managed datacenters and monitored
them. He is also one of the leaders of {OpenGrok the source code search
engine.
Michal is a Senior Software Engineer at Sizmek with background in both real-
time and batch distributed systems. Previously employed at Seznam.cz, a
local Czech alternative of Google. A strong proponent of Scala and an
occasional Apache Spark contributor.
Sizmek
▪ Largest independent buy-side advertising platform
▪ AI-driven decisioning engine
▪ Platform in more than 70 countries used by more than 20,000
advertisers and 3,600 agencies
▪ Delivering billions of impressions per day
▪ Processing hundreds of billions of bid requests per day
Topics
▪ Use case
▪ Data model
▪ Server and driver configuration
▪ Short-lived caching
▪ Spark interoperability
▪ Optimizations
USE CASE
Real Time Bidding
Record impression
5
4b
Notification
Bid request
Bid + HTML
2
3
Request ad
Serve ad
1
4a
Render ad
Ad HTML
6
7
RTB EXCHANGE
USER
ADSERVER
User Profile
▪ Merged from multiple sources
▪ Tracking all active users/devices
• Multiple devices for one user/household (cluster)
▪ Very short SLA
▪ Queried for every bid request
▪ Drives bidding decision
• Leveraging AI models
Environment
▪ Volume: Around 50 billion profiles
▪ Throughput: Several millions of requests per second
• Single request can initiate multiple profile lookups
▪ Latency: Tens of milliseconds for each request
• Useless if exceeded
▪ Distribution: Multiple DCs serving hundreds of clients each
DATA MODEL
Attributes
Two types of user profile attributes:
▪ Primitive
• Single value, overwrite semantics
▪ Collection
• Multiple values, append semantics
Complex structured values
Lookups
▪ Single profile
▪ Profile cluster
• Each profile can reference other profiles (graph)
• Merge referenced profiles into main profile
Can specify attribute subset
Mutations
▪ Profile merging
• Primitive overwrite & collection append
▪ Attribute deletion
• Remove whole attribute
▪ Value deletion
• Remove single collection element
Housekeeping
▪ TTL enforcement
• Each attribute can have a different TTL
▪ Collection attribute pruning
• Keep only N newest values
• Custom processing (deduplication)
Data Model
Primitive attributes
user_id: blob
attribute1: blob
attribute2: blob
...
Collection attributes
user_id: blob
attribute_id: byte
value: blob
▪ Co-partitioning
• All attributes of a single profile share the
same shard set
▪ Grouping by collection attribute
• Efficient lookup by attribute id
▪ Efficient value serialization
• Google Protocol Buffers
SERVER & DRIVER
CONFIGURATION
Server Configuration
▪ Enterprise 2018.1.5 used, repairs automated by manager
▪ Global cluster - 7 DCs (sister sites in every region)
▪ Keyspace per region (Americas, EMEA, APAC) - replication
▪ HW resources divided between Scylla and Spark
▪ Read latency based scylla.yaml:
• background_writer_scheduling_quota: 0.5
• auto_adjust_flush_quota: true
Java Driver Configuration
▪ Custom 3.5.1 cassandra based shard aware driver
▪ Increase connections in pooling-options to match number of cores
• core/max, local/remote
▪ Local region keyspace for read, all region keyspaces for write
▪ Custom retry policy
▪ LZ4 compression enabled
SHORT-LIVED CACHING
Motivation
User profile cluster lookup
▪ Lookup of main profile and all related profiles + merging
▪ Large read amplification
▪ Related profiles provide supplementary information
• Not as time sensitive, minor staleness acceptable
Concept
Bidder
Bid Request
ProfileStore
Get
Main Profile + Cached Combined Profile
Get All Profiles in the Cluster
Write Combined Profile into the Cache
Executed for each bid request
Executed for bid request without cached profile
Replication Vs. Co-location
▪ Same replication
• Higher load
• Contact only one node on profile lookup
▪ No replication
• Token range replica locations different
• Potentially contact multiple nodes on profile lookup
Minimizing Disk Usage
▪ Time window compaction strategy (TWCS)
• Compaction window = TTL
• Min. threshold = 2 (compacts SSTables as soon as possible)
▪ 2 SSTables exist, 1 is actively read, compaction “cycles” them
Cache Efficiency
85.3% cache hit ratio
on bid requests
SPARK INTEROPERABILITY
Spark Cassandra Connector
RDD implementation from Datastax
▪ Shared database sessions
▪ Token range partitioning
• Limited shard awareness
▪ Location awareness
• No traffic between hosts if not needed
Custom RDD Implementation
▪ Taking advantage of co-partitioned tables
▪ Aggressive batching
• Accumulation of (unlogged) batches per shard set
• Size-based flushing
▪ Element deletion semantics
Custom DataFrame Implementation
▪ Protocol buffer awareness
▪ Taking advantage of Catalyst & Tungsten
▪ Full HiveQL support
• Statistics jobs
OPTIMIZATIONS
Java Driver Optimization
ScyllaDB team is preparing an optimized Cassandra driver which
introduces shard awareness
Using shard aware driver in PoC run:
1800 profiles / second x core => 2300 profiles / second x
core
== 28% increase in throughput
*https://github.com/haaawk/java-driver/tree/3.5.1-scylla/
Conclusion
Comparison
■ High availability
■ Better replication
■ Lowered HW requirements
● Nodes: 77 ⇒ 16 (79%)
● Cores: 924 ⇒ 320 (65%)
● RAM: 10TB ⇒ 6TB (38%)
● Disk: 240TB ⇒ 270TB (13%)
Thank You
Any Questions ?
Please stay in touch
Lubos.Kosco@Sizmek.com, Michal.Senkyr@Sizmek.com
@LubosKosco, @MichalSenkyr

More Related Content

What's hot

Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIScyllaDB
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?ScyllaDB
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScyllaDB
 
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformReal-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformScyllaDB
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
 
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...ScyllaDB
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
 
Scylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with NewtsScylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with NewtsScyllaDB
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDBScylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDBScyllaDB
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0ScyllaDB
 
Running Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorRunning Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorScyllaDB
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsScyllaDB
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraScyllaDB
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScyllaDB
 

What's hot (20)

Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
 
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformReal-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Scylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with NewtsScylla Summit 2018: Scaling your time series data with Newts
Scylla Summit 2018: Scaling your time series data with Newts
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDBScylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Running Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla OperatorRunning Scylla on Kubernetes with Scylla Operator
Running Scylla on Kubernetes with Scylla Operator
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache Cassandra
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
 

Similar to Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles in Real Time with Scylla

Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix
 
OrigoDB - take the red pill
OrigoDB - take the red pillOrigoDB - take the red pill
OrigoDB - take the red pillRobert Friberg
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSaaS Is Beautiful
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개Ha-Yang(White) Moon
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...RightScale
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldStéphane Dorrekens
 

Similar to Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles in Real Time with Scylla (20)

Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmark
 
OrigoDB - take the red pill
OrigoDB - take the red pillOrigoDB - take the red pill
OrigoDB - take the red pill
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Recently uploaded (20)

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 

Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles in Real Time with Scylla

  • 1. Adventures in AdTech (processing 50B user profiles in real time) Lubos Kosco, Michal Senkyr Senior Software Engineers at Sizmek
  • 3. Presenters Bio Lubos is working as a software engineer in Sizmek taking care of 7 datacenters full of bidding infrastructure to deliver real time bids and impressions for marketing campaigns of Sizmeks clients. Previously he used to work in Oracle and Sun Microsystems on products that managed datacenters and monitored them. He is also one of the leaders of {OpenGrok the source code search engine. Michal is a Senior Software Engineer at Sizmek with background in both real- time and batch distributed systems. Previously employed at Seznam.cz, a local Czech alternative of Google. A strong proponent of Scala and an occasional Apache Spark contributor.
  • 4. Sizmek ▪ Largest independent buy-side advertising platform ▪ AI-driven decisioning engine ▪ Platform in more than 70 countries used by more than 20,000 advertisers and 3,600 agencies ▪ Delivering billions of impressions per day ▪ Processing hundreds of billions of bid requests per day
  • 5. Topics ▪ Use case ▪ Data model ▪ Server and driver configuration ▪ Short-lived caching ▪ Spark interoperability ▪ Optimizations
  • 7. Real Time Bidding Record impression 5 4b Notification Bid request Bid + HTML 2 3 Request ad Serve ad 1 4a Render ad Ad HTML 6 7 RTB EXCHANGE USER ADSERVER
  • 8. User Profile ▪ Merged from multiple sources ▪ Tracking all active users/devices • Multiple devices for one user/household (cluster) ▪ Very short SLA ▪ Queried for every bid request ▪ Drives bidding decision • Leveraging AI models
  • 9. Environment ▪ Volume: Around 50 billion profiles ▪ Throughput: Several millions of requests per second • Single request can initiate multiple profile lookups ▪ Latency: Tens of milliseconds for each request • Useless if exceeded ▪ Distribution: Multiple DCs serving hundreds of clients each
  • 11. Attributes Two types of user profile attributes: ▪ Primitive • Single value, overwrite semantics ▪ Collection • Multiple values, append semantics Complex structured values
  • 12. Lookups ▪ Single profile ▪ Profile cluster • Each profile can reference other profiles (graph) • Merge referenced profiles into main profile Can specify attribute subset
  • 13. Mutations ▪ Profile merging • Primitive overwrite & collection append ▪ Attribute deletion • Remove whole attribute ▪ Value deletion • Remove single collection element
  • 14. Housekeeping ▪ TTL enforcement • Each attribute can have a different TTL ▪ Collection attribute pruning • Keep only N newest values • Custom processing (deduplication)
  • 15. Data Model Primitive attributes user_id: blob attribute1: blob attribute2: blob ... Collection attributes user_id: blob attribute_id: byte value: blob ▪ Co-partitioning • All attributes of a single profile share the same shard set ▪ Grouping by collection attribute • Efficient lookup by attribute id ▪ Efficient value serialization • Google Protocol Buffers
  • 17. Server Configuration ▪ Enterprise 2018.1.5 used, repairs automated by manager ▪ Global cluster - 7 DCs (sister sites in every region) ▪ Keyspace per region (Americas, EMEA, APAC) - replication ▪ HW resources divided between Scylla and Spark ▪ Read latency based scylla.yaml: • background_writer_scheduling_quota: 0.5 • auto_adjust_flush_quota: true
  • 18. Java Driver Configuration ▪ Custom 3.5.1 cassandra based shard aware driver ▪ Increase connections in pooling-options to match number of cores • core/max, local/remote ▪ Local region keyspace for read, all region keyspaces for write ▪ Custom retry policy ▪ LZ4 compression enabled
  • 20. Motivation User profile cluster lookup ▪ Lookup of main profile and all related profiles + merging ▪ Large read amplification ▪ Related profiles provide supplementary information • Not as time sensitive, minor staleness acceptable
  • 21. Concept Bidder Bid Request ProfileStore Get Main Profile + Cached Combined Profile Get All Profiles in the Cluster Write Combined Profile into the Cache Executed for each bid request Executed for bid request without cached profile
  • 22. Replication Vs. Co-location ▪ Same replication • Higher load • Contact only one node on profile lookup ▪ No replication • Token range replica locations different • Potentially contact multiple nodes on profile lookup
  • 23. Minimizing Disk Usage ▪ Time window compaction strategy (TWCS) • Compaction window = TTL • Min. threshold = 2 (compacts SSTables as soon as possible) ▪ 2 SSTables exist, 1 is actively read, compaction “cycles” them
  • 24. Cache Efficiency 85.3% cache hit ratio on bid requests
  • 26. Spark Cassandra Connector RDD implementation from Datastax ▪ Shared database sessions ▪ Token range partitioning • Limited shard awareness ▪ Location awareness • No traffic between hosts if not needed
  • 27. Custom RDD Implementation ▪ Taking advantage of co-partitioned tables ▪ Aggressive batching • Accumulation of (unlogged) batches per shard set • Size-based flushing ▪ Element deletion semantics
  • 28. Custom DataFrame Implementation ▪ Protocol buffer awareness ▪ Taking advantage of Catalyst & Tungsten ▪ Full HiveQL support • Statistics jobs
  • 30. Java Driver Optimization ScyllaDB team is preparing an optimized Cassandra driver which introduces shard awareness Using shard aware driver in PoC run: 1800 profiles / second x core => 2300 profiles / second x core == 28% increase in throughput *https://github.com/haaawk/java-driver/tree/3.5.1-scylla/
  • 32. Comparison ■ High availability ■ Better replication ■ Lowered HW requirements ● Nodes: 77 ⇒ 16 (79%) ● Cores: 924 ⇒ 320 (65%) ● RAM: 10TB ⇒ 6TB (38%) ● Disk: 240TB ⇒ 270TB (13%)
  • 33. Thank You Any Questions ? Please stay in touch Lubos.Kosco@Sizmek.com, Michal.Senkyr@Sizmek.com @LubosKosco, @MichalSenkyr