SlideShare a Scribd company logo
1 of 30
Download to read offline
REAL-TIME DATA PROCESSING AT RTB HOUSEREAL-TIME DATA PROCESSING AT RTB HOUSE
BIG DATA TECHNOLOGY MOSCOW 2018
OCTOBER 10-11, 2018
BIG DATA TECHNOLOGY MOSCOW 2018
OCTOBER 10-11, 2018
ARCHITECTURE & LESSONS LEARNED
BARTOSZ ŁOŚ
REAL-TIME DATA PROCESSING AT RTB HOUSE
TABLE OF CONTENTS
Agenda:
- our rtb platform
- the first iteration: mutable structures
- the second iteration: data-flow
- the third iteration: immutable streams of events
- the fourth iteration: multi-dc architecture
- the current iteration: kafka workers
- summary
02/30
OUR RTB PLATFORM
OUR RTB PLATFORM: THE CONTEXT 04/30
Bid requests:
2M/s (peak)
~30 SSP networks
<50-100ms
User events:
1.5B tags/day
350M impressions/day
3.5M clicks/day
1.5M conversions/day
Other events:
bidlogs, accesslogs,
domain events etc.
OUR RTB PLATFORM: DATA PROCESSING NUMBERS
Kafka:
- up to 250K+ messages per second
- 50TB+ processed data every day
- 6 clusters in 4 datacenters
- 26 Kafka brokers
- 85 topics, 5000+ partitions
Docker (processing components only):
- 44 engines
- 1408 cpu cores, 5.5TB ram
- 800+ containers
05/30
HDFS:
- 2PB+ data, up to 10GB/s
BigQuery:
- 1PB+ data, up to 10GB/min
Elasticsearch:
- 40TB data, up to 50K events/s
Aerospike (processing only):
- 80TB data, up to 8K events/s
THE FIRST ITERATION
THE 1ST ITERATION: MUTABLE IMPRESSIONS 07/30
THE 1ST ITERATION: DRAWBACKS
Issues:
- long, overloading data migrations (30 days back)
- complex servlets' logic, inability to reprocess
- inflexible, various schemas
- single-DC
- inconsistencies
08/30
THE SECOND ITERATION:
DATA-FLOW
THE 2ND ITERATION: THE 1ST DATA-FLOW ARCHITECTURE 10/30
THE 2ND ITERATION: DISTRIBUTED LOG
Why Apache Kafka:
- distributed log
- topics partitioning
- partition replication
- log retention
- stateless
- efficient data consuming
11/30
THE 2ND ITERATION: BATCH LOADING
Why Apache Camus:
- "Kafka to HDFS" pipeline
- batch tool
- map-reduce jobs
- storing offsets in log files
- data partitioning
12/30
THE 2ND ITERATION: AVRO & SCHEMA VERSIONING
Why Apache Avro:
- compact, efficient format
- schema: JSON format, payload: binary format
- self-describing container files
- rich data structures
- schema changes support, reader & writer schemas
Our approach:
- Kafka's messages and HDFS files
- schema registry
- avro-fastserde
13/30
(github.com/RTBHOUSE/avro-fastserde)
THE 2ND ITERATION: ACCURATE STATISTICS
Why Apache Storm:
- real-time processing
- streams of tuples, topologies
- fault-tolerance
Why Trident:
- transactions, exactly-once processing
- microbatches (latency & throughput)
14/30
THE 2ND ITERATION: STATS-COUNTER TOPOLOGY 15/30
THE 2ND ITERATION: DRAWBACKS
Hybrid architecture:
- aggregates (real-time)
- raw events (2-hour batches)
- joined events (end-of-day batch jobs)
Other issues:
- Hive joins
- mutable events
- servlets' complex logic
16/30
THE THIRD ITERATION:
NEW APPROACH
THE 3RD ITERATION: NEW APPROACH
{ "IMPRESSION”:
"URL”,
"TIME”,
"CREATIVE”,
...
"CLICKS”,
"CONVERSIONS”
}
{ "CLICK”:
"TIME”,
"IMPRESSION_ID”,
...
"IMPRESSION”
}
{ "CONVERSION”:
"TIME”,
"CLICK_ID”,
...
"IMPRESSION”,
"CLICK”
}
New approach:
- real-time processing
- publishing light events
- immutable streams of events
18/30
THE 3RD ITERATION: HIGH-LEVEL ARCHITECTURE 19/30
THE 3RD ITERATION: DATA-FLOW TOPOLOGY 20/30
THE FOURTH ITERATION:
MULTI-DC
THE 4TH ITERATION: NEW REQUIREMENTS
Main changes:
- 5-6x larger scale:
> from 350K to 2M bid requests/s within 1.5 years
- full multi-dc architecture:
> merging streams of events
> synchronization of user profiles
- end-to-end exactly-once processing:
> at-least-once output semantics + deduplication
- a few better components:
> merger
> new stats-counter, new data-flow
> dispatcher & loader
> logstash
22/30
THE 4TH ITERATION: MULTI-DC ARCHITECTURE 23/30
THE 4TH ITERATION: NEW DATA-FLOW ON KAFKA STREAMS 24/30
(picture from kafka.apache.org)
Why Kafka Streams:
- fully embedded library with no stream
processing cluster
- no external dependencies
- Kafka's parallelism model and group
membership mechanism
- event-at-a-time processing
(not microbatch)
- exactly-once processing semantics
(but at-least-once was good enough)
THE 4TH ITERATION: MERGER ON KAFKA CONSUMER API 25/30
THE CURRENT ITERATION:
KAFKA WORKERS
THE 5TH ITERATION: KAFKA WORKERS
Main features:
- higher level of distribution
- possibility to pause and resume processing for given partition
- asynchronous processing
- tighter control of offsets commits
- backpressure
- at-least-once semantics
- processing timeouts
- handling failures
- multiple consumers (in progress)
- kafka-to-kafka, hdfs, bigquery, elasticsearch connectors (in progress)
27/30
(github.com/RTBHOUSE/kafka-workers)
THE 5TH ITERATION: KAFKA WORKERS ARCHITECTURE 28/30
SUMMARY
What we have achieved:
- platform monitoring
- much more stable platform
- higher quality of data processing
- HDFS & BigQuery & Elasticsearch streaming
- multi-DC architecture and data synchronization
- high scalability
- better data-flow monitoring, deployment & maintenance
29/30
REAL-TIME DATA PROCESSING AT RTB HOUSEREAL-TIME DATA PROCESSING AT RTB HOUSE
BIG DATA TECHNOLOGY MOSCOW 2018
OCTOBER 10-11, 2018
THANK YOU FOR YOUR
ATTENTION

More Related Content

What's hot

Go-to-market Strategy and Customer Acquisition - Mind your Business 2014
Go-to-market Strategy and Customer Acquisition - Mind your Business 2014 Go-to-market Strategy and Customer Acquisition - Mind your Business 2014
Go-to-market Strategy and Customer Acquisition - Mind your Business 2014 Marie Laenen
 
Demand quest seo training 1 16x9 10.2018
Demand quest seo training 1 16x9 10.2018Demand quest seo training 1 16x9 10.2018
Demand quest seo training 1 16x9 10.2018Nate Plaunt
 
Account-Based Marketing 101
Account-Based Marketing 101Account-Based Marketing 101
Account-Based Marketing 101Kwanzoo Inc
 
Buying Process Playbook
Buying Process PlaybookBuying Process Playbook
Buying Process PlaybookDemand Metric
 
Bridging The Gap Between Sales And Marketing
Bridging The Gap Between Sales And MarketingBridging The Gap Between Sales And Marketing
Bridging The Gap Between Sales And Marketingguest3d2e50c
 
ABM Master Class: Targeting
ABM Master Class: TargetingABM Master Class: Targeting
ABM Master Class: TargetingDemandbase
 
Go-To-Market Strategy & Sales Enablement Framework
Go-To-Market Strategy & Sales Enablement FrameworkGo-To-Market Strategy & Sales Enablement Framework
Go-To-Market Strategy & Sales Enablement FrameworkLink Cheng
 
Optimising user acquisition through LTV prediction
Optimising user acquisition through LTV predictionOptimising user acquisition through LTV prediction
Optimising user acquisition through LTV predictionGameCamp
 
BettorUP Sports Investor Deck up
BettorUP Sports Investor Deck upBettorUP Sports Investor Deck up
BettorUP Sports Investor Deck upBettorUp_Sports
 
Sales & Marketing Development Plan - a template for the CRO
Sales & Marketing Development Plan - a template for the CROSales & Marketing Development Plan - a template for the CRO
Sales & Marketing Development Plan - a template for the CROFan Foundry
 
Recency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryRecency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryQualex Asia
 
Guide: Conjoint Analysis
Guide: Conjoint AnalysisGuide: Conjoint Analysis
Guide: Conjoint AnalysisQuestionPro
 
The Framework of Account-Based Marketing
The Framework of Account-Based MarketingThe Framework of Account-Based Marketing
The Framework of Account-Based MarketingInsideSalesTeam.Com
 
Indie Game Market Research Presentation
Indie Game Market Research PresentationIndie Game Market Research Presentation
Indie Game Market Research PresentationLogan Williams
 
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...CARTO
 
Marketo Account-Based Marketing
Marketo Account-Based MarketingMarketo Account-Based Marketing
Marketo Account-Based MarketingMarketo
 
Digital Marketing Service Provider
Digital Marketing Service ProviderDigital Marketing Service Provider
Digital Marketing Service ProviderFomaxtechnology
 

What's hot (20)

Go-to-market Strategy and Customer Acquisition - Mind your Business 2014
Go-to-market Strategy and Customer Acquisition - Mind your Business 2014 Go-to-market Strategy and Customer Acquisition - Mind your Business 2014
Go-to-market Strategy and Customer Acquisition - Mind your Business 2014
 
Demand quest seo training 1 16x9 10.2018
Demand quest seo training 1 16x9 10.2018Demand quest seo training 1 16x9 10.2018
Demand quest seo training 1 16x9 10.2018
 
Account-Based Marketing 101
Account-Based Marketing 101Account-Based Marketing 101
Account-Based Marketing 101
 
Buying Process Playbook
Buying Process PlaybookBuying Process Playbook
Buying Process Playbook
 
Bridging The Gap Between Sales And Marketing
Bridging The Gap Between Sales And MarketingBridging The Gap Between Sales And Marketing
Bridging The Gap Between Sales And Marketing
 
ABM Master Class: Targeting
ABM Master Class: TargetingABM Master Class: Targeting
ABM Master Class: Targeting
 
Go-To-Market Strategy & Sales Enablement Framework
Go-To-Market Strategy & Sales Enablement FrameworkGo-To-Market Strategy & Sales Enablement Framework
Go-To-Market Strategy & Sales Enablement Framework
 
Optimising user acquisition through LTV prediction
Optimising user acquisition through LTV predictionOptimising user acquisition through LTV prediction
Optimising user acquisition through LTV prediction
 
CRM Framework
CRM FrameworkCRM Framework
CRM Framework
 
BettorUP Sports Investor Deck up
BettorUP Sports Investor Deck upBettorUP Sports Investor Deck up
BettorUP Sports Investor Deck up
 
Sales & Marketing Development Plan - a template for the CRO
Sales & Marketing Development Plan - a template for the CROSales & Marketing Development Plan - a template for the CRO
Sales & Marketing Development Plan - a template for the CRO
 
Recency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryRecency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industry
 
Guide: Conjoint Analysis
Guide: Conjoint AnalysisGuide: Conjoint Analysis
Guide: Conjoint Analysis
 
The Framework of Account-Based Marketing
The Framework of Account-Based MarketingThe Framework of Account-Based Marketing
The Framework of Account-Based Marketing
 
PPC 101
PPC 101PPC 101
PPC 101
 
Indie Game Market Research Presentation
Indie Game Market Research PresentationIndie Game Market Research Presentation
Indie Game Market Research Presentation
 
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
Winning Market Expansion Strategies for CPG brands, Using Spatial Data and An...
 
Marketo Account-Based Marketing
Marketo Account-Based MarketingMarketo Account-Based Marketing
Marketo Account-Based Marketing
 
Go-To Market Plan
Go-To Market PlanGo-To Market Plan
Go-To Market Plan
 
Digital Marketing Service Provider
Digital Marketing Service ProviderDigital Marketing Service Provider
Digital Marketing Service Provider
 

Similar to Real-Time Data Processing Evolution at RTB House

Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz ŁośReal Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz ŁośEvention
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedEdureka!
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven MicroservicesFabrizio Fortino
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingAbdelhamide EL ARIB
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformGuido Schmutz
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...DataStax Academy
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Micron Technology
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Microservices with Spring 5 Webflux - jProfessionals
Microservices  with Spring 5 Webflux - jProfessionalsMicroservices  with Spring 5 Webflux - jProfessionals
Microservices with Spring 5 Webflux - jProfessionalsTrayan Iliev
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 

Similar to Real-Time Data Processing Evolution at RTB House (20)

Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz ŁośReal Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz Łoś
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Microservices with Spring 5 Webflux - jProfessionals
Microservices  with Spring 5 Webflux - jProfessionalsMicroservices  with Spring 5 Webflux - jProfessionals
Microservices with Spring 5 Webflux - jProfessionals
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 

Recently uploaded

Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...sonatiwari757
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 

Recently uploaded (20)

Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
Call Girls in Mayur Vihar ✔️ 9711199171 ✔️ Delhi ✔️ Enjoy Call Girls With Our...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 

Real-Time Data Processing Evolution at RTB House

  • 1. REAL-TIME DATA PROCESSING AT RTB HOUSEREAL-TIME DATA PROCESSING AT RTB HOUSE BIG DATA TECHNOLOGY MOSCOW 2018 OCTOBER 10-11, 2018 BIG DATA TECHNOLOGY MOSCOW 2018 OCTOBER 10-11, 2018 ARCHITECTURE & LESSONS LEARNED BARTOSZ ŁOŚ REAL-TIME DATA PROCESSING AT RTB HOUSE
  • 2. TABLE OF CONTENTS Agenda: - our rtb platform - the first iteration: mutable structures - the second iteration: data-flow - the third iteration: immutable streams of events - the fourth iteration: multi-dc architecture - the current iteration: kafka workers - summary 02/30
  • 4. OUR RTB PLATFORM: THE CONTEXT 04/30 Bid requests: 2M/s (peak) ~30 SSP networks <50-100ms User events: 1.5B tags/day 350M impressions/day 3.5M clicks/day 1.5M conversions/day Other events: bidlogs, accesslogs, domain events etc.
  • 5. OUR RTB PLATFORM: DATA PROCESSING NUMBERS Kafka: - up to 250K+ messages per second - 50TB+ processed data every day - 6 clusters in 4 datacenters - 26 Kafka brokers - 85 topics, 5000+ partitions Docker (processing components only): - 44 engines - 1408 cpu cores, 5.5TB ram - 800+ containers 05/30 HDFS: - 2PB+ data, up to 10GB/s BigQuery: - 1PB+ data, up to 10GB/min Elasticsearch: - 40TB data, up to 50K events/s Aerospike (processing only): - 80TB data, up to 8K events/s
  • 7. THE 1ST ITERATION: MUTABLE IMPRESSIONS 07/30
  • 8. THE 1ST ITERATION: DRAWBACKS Issues: - long, overloading data migrations (30 days back) - complex servlets' logic, inability to reprocess - inflexible, various schemas - single-DC - inconsistencies 08/30
  • 10. THE 2ND ITERATION: THE 1ST DATA-FLOW ARCHITECTURE 10/30
  • 11. THE 2ND ITERATION: DISTRIBUTED LOG Why Apache Kafka: - distributed log - topics partitioning - partition replication - log retention - stateless - efficient data consuming 11/30
  • 12. THE 2ND ITERATION: BATCH LOADING Why Apache Camus: - "Kafka to HDFS" pipeline - batch tool - map-reduce jobs - storing offsets in log files - data partitioning 12/30
  • 13. THE 2ND ITERATION: AVRO & SCHEMA VERSIONING Why Apache Avro: - compact, efficient format - schema: JSON format, payload: binary format - self-describing container files - rich data structures - schema changes support, reader & writer schemas Our approach: - Kafka's messages and HDFS files - schema registry - avro-fastserde 13/30 (github.com/RTBHOUSE/avro-fastserde)
  • 14. THE 2ND ITERATION: ACCURATE STATISTICS Why Apache Storm: - real-time processing - streams of tuples, topologies - fault-tolerance Why Trident: - transactions, exactly-once processing - microbatches (latency & throughput) 14/30
  • 15. THE 2ND ITERATION: STATS-COUNTER TOPOLOGY 15/30
  • 16. THE 2ND ITERATION: DRAWBACKS Hybrid architecture: - aggregates (real-time) - raw events (2-hour batches) - joined events (end-of-day batch jobs) Other issues: - Hive joins - mutable events - servlets' complex logic 16/30
  • 18. THE 3RD ITERATION: NEW APPROACH { "IMPRESSION”: "URL”, "TIME”, "CREATIVE”, ... "CLICKS”, "CONVERSIONS” } { "CLICK”: "TIME”, "IMPRESSION_ID”, ... "IMPRESSION” } { "CONVERSION”: "TIME”, "CLICK_ID”, ... "IMPRESSION”, "CLICK” } New approach: - real-time processing - publishing light events - immutable streams of events 18/30
  • 19. THE 3RD ITERATION: HIGH-LEVEL ARCHITECTURE 19/30
  • 20. THE 3RD ITERATION: DATA-FLOW TOPOLOGY 20/30
  • 22. THE 4TH ITERATION: NEW REQUIREMENTS Main changes: - 5-6x larger scale: > from 350K to 2M bid requests/s within 1.5 years - full multi-dc architecture: > merging streams of events > synchronization of user profiles - end-to-end exactly-once processing: > at-least-once output semantics + deduplication - a few better components: > merger > new stats-counter, new data-flow > dispatcher & loader > logstash 22/30
  • 23. THE 4TH ITERATION: MULTI-DC ARCHITECTURE 23/30
  • 24. THE 4TH ITERATION: NEW DATA-FLOW ON KAFKA STREAMS 24/30 (picture from kafka.apache.org) Why Kafka Streams: - fully embedded library with no stream processing cluster - no external dependencies - Kafka's parallelism model and group membership mechanism - event-at-a-time processing (not microbatch) - exactly-once processing semantics (but at-least-once was good enough)
  • 25. THE 4TH ITERATION: MERGER ON KAFKA CONSUMER API 25/30
  • 27. THE 5TH ITERATION: KAFKA WORKERS Main features: - higher level of distribution - possibility to pause and resume processing for given partition - asynchronous processing - tighter control of offsets commits - backpressure - at-least-once semantics - processing timeouts - handling failures - multiple consumers (in progress) - kafka-to-kafka, hdfs, bigquery, elasticsearch connectors (in progress) 27/30 (github.com/RTBHOUSE/kafka-workers)
  • 28. THE 5TH ITERATION: KAFKA WORKERS ARCHITECTURE 28/30
  • 29. SUMMARY What we have achieved: - platform monitoring - much more stable platform - higher quality of data processing - HDFS & BigQuery & Elasticsearch streaming - multi-DC architecture and data synchronization - high scalability - better data-flow monitoring, deployment & maintenance 29/30
  • 30. REAL-TIME DATA PROCESSING AT RTB HOUSEREAL-TIME DATA PROCESSING AT RTB HOUSE BIG DATA TECHNOLOGY MOSCOW 2018 OCTOBER 10-11, 2018 THANK YOU FOR YOUR ATTENTION