Scaling event aggregation at twitter

Scaling Event Aggregation at Twitter to
Handle Billions of Events per minute
Lohit Vijaya Renu & Zhenzhao Wang & Joep Rottinghuis
Twitter

Events and Event Logs @Twitter
● Introduction
● Architecture
● Aggregators
● Improvements
● Future
● Q&A

Scale of Event Log Aggregation
How many and how big?
~10PB
Across millions of
clients
~3-4.1T
Trillion Events a Day of Data a Day
Incoming
uncompressed

● Clients log events specifying a Category name.
○ Eg. ads_click, like_tweet_event …
○ Multiple format. Thrift, Json, and etc.
● Events are grouped together across all clients into the Category.
○ Client could sent events via REST Endpoints. E.g. from web.
○ Client generate events via logging libs. E.g. thrift lib.
Whats is a Event?

● Events are stored on HDFS, bucketed every hour into separate directories
○ E.g. /logs/ads_click/2020/09/01/23
● Events are delivered in various format.
○ Parquet format.
○ Row-based thrift-lzo format.
● Event logs are replicated to other clusters
○ Production clusters, ad hoc clusters, test clusters.
● Multiple consumers.
○ Presto, Spark, Scalding, and etc.
○ Streaming systems.
How do we deliver the event and how is it consumed?

Architecture
● Modularized and Independently
Scalable
○ Client daemon
○ Aggregator daemon
● User choose destination based on
need
○ Message queue based systems
for stream processing
○ HDFS for batch processing
Overview

Architecture
● Client daemon is long running
process per host.
● Provide simple interface to
services on the same host and
hide backend details.
● Leverage service discovery to
forward events to Aggregator
daemons
● Client log events to local Client
Daemon (can buﬀer to disk)
Client daemon

Aggregators
● Aggregate events from massive
clients into category (or category
group in new framework) on
HDFS.
● Write data in YYYY/MM/DD/HH
structure based on event arrival
time.
● Deliver data to message queue
based system depending on
conﬁguration.
● Back-oﬀ signal on high load or
downstream failure
Overview

Aggregator
● Zookeeper based service discovery
○ DC failover support
○ Aggregator register as ephemeral node.
● Tier based approach.
○ Categories are divided into diﬀerent tiers based on priority
○ Control blast radius
○ Scale independently
○ Diﬀerent sementantics and params based on tier
● Category level based parameter tuning
○ Batch size
○ Retry prams such as timeout
Service discovery

Category group
● Scalability of HDFS
○ Small files because of traffic distribution
○ Small files generated by small log categories
○ File number is too huge for namenode to handle
● What is Category Group?
○ Multiple categories with similar properties grouped together by
Flume and written to same HDFS files.
Why and what?

Event Logs on HDFS
Category Group
How?
Aggregator Aggregator New Aggregator
Client Daemon
Client 1 Client N
Client 2
Client Daemon Client Daemon
New Format
● Category group transparent to users.
● Category group is configured and
invisible in implementation
● Category group is created based on
the consideration of traffic, outformat,
and etc
● Aggregators group the data of
category group into same files
● Category group killed small files and
reduced file number >3x based on
current config

Aggregator Group
● Aggregator group is configured and
invisible in implementation
● Aggregator group is configured
based on traffic, priority and etc
● Scale independently and resource
isolation
● Friendly for debug, exp, test and
migration
Why and how?

Single Aggregator Improvement
● Memory model improvement
○ Introduce memory channel group. Memory is shared in the
group
○ Conﬁgure channel group based on destination, data priority
and etc
○ Set max memory usage per group to prevent bad client
● Bounded API to tackle slowness
● Introduce microbatch

Single Aggregator Improvement
● Batch isolation
● Improvement
○ Max lim
● Resource isolation
● Friendly for debug, exp, test and
migration
Micro batch benchmark

Future
● Reduce E2E latency
● Better failure case handling such as network error
● Implement a full service of QOS per category

Scaling event aggregation at twitter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling event aggregation at twitter

Similar to Scaling event aggregation at twitter (20)

More from lohitvijayarenu

More from lohitvijayarenu (8)

Recently uploaded

Recently uploaded (20)

Scaling event aggregation at twitter