SlideShare a Scribd company logo
Event Log
Pipeline @Twitter
Challenges of handling Billions
of events per minute
Systems @Scale Fall 2021
Lohit VijayaRenu
@lohitvijayarenu
#
TAGHASHT
AG#HASH
#HASHTAG
#
TAG#HASH
DataPlatform @Twitter
#Startups #Yahoo!
#Hadoop
Lohit VijayaRenu
@lohitvijayarenu
9:35AM · Oct 13, 2021
Example of
slide
● User interactions generate events
● Events are grouped as Datasets
● Datasets for Data processing & Analytics
Event Log Pipeline
Event Log Pipeline
1B 5B
~500 datasets
● Scale
● Resource requirement
~800 datasets
Events per
minute
Events per
minute
Challenges
Example of
slide
Architecture
Microservices
>100k instances
Clients
Events
Example of
slide
Architecture
Microservices
>100k instances
Clients
Event
Aggregation
Scribe
~3K instances
Events
Example of
slide
Architecture
Microservices
>100k instances
Clients
Event
Aggregation
Scribe
~3K instances
MapReduce
>10K jobs
Event
Processing
Events
Example of
slide
Architecture
Microservices
>100k instances
Clients
Event
Aggregation
Scribe
~3K instances
MapReduce
>10K jobs
HDFS, Presto, Scalding
>100PBs of data
Event
Processing
Event Log
Management
Events Datasets
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Per aggregator scaling challenges
● No multi tenancy for datasets
● Eco system integration
● Difficult to maintain and improve
Event Aggregation
Problem
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Moved to open source Flume
● Per aggregator improvements
● Microbatch, memory management,...
● Optimize resource usage
● Streaming at network speed
Event Aggregation
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Memory spikes before and after
Event Aggregation
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Data tiers with priorities
● Dataset groups
● Aggregator groups
● Dynamic scaling
Event Aggregation
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Scaling Aggregation
Event Aggregation
Dataset 1
Dataset 2
Dataset 3
Aggregator Group 1
Aggregator Group 2
Service
Discovery
Dataset Storage
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
Event Processing
Problem
● Thousands of process jobs
● Multitenancy
● Long tail reducers
● Unpredictable run time (traffic surge)
Example of
slide
Event Processing
● Apache Tez based processor
● Data tier processors
● Dynamic Hash based partition
Example of
slide
Event Processing
● Apache Tez based processor
● Data tier processors
● Dynamic Hash based partition
Raw Events
Partition
Partition
Partition
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
Migration
● Transparent migration
● More than 5 Billion events per
minute
● Per dataset scaling
● Reliable and fault tolerant framework
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
Migration
● Transparent migration
● More than 5 Billion events per
minute
● Per dataset scaling
● Reliable and fault tolerant framework
But…….
Latency!!!
Example of
slide
Near real time
Near real time
event streaming
● Analytics on real time user interactions
● End to end latency of minutes
● Data storage with real time insertion
Example of
slide
Architecture
Microservices
>1k instances
Clients
Events
Example of
slide
Architecture
Microservices
>10k instances
Clients
Streaming
Aggregation
Flume, Google
PubSub
Events
Example of
slide
Architecture
Microservices
>10k instances
Clients
Streaming
Aggregation
Flume, Google
PubSub
Apache Beam,
Google Dataflow
Streaming
Processor
Events
Example of
slide
Architecture
Microservices
>10k instances
Clients
Streaming
Aggregation
Flume, Google
PubSub
Apache Beam,
Google Dataflow
BigQuery, Google Cloud
Storage, Amplitude
Streaming
Processor
Real time
insertion
Events Datasets
Example of
slide
Latency
Dataset Type Event Latency
p90 p95 p99
Dataset 1 (very small datasets) 0.42 sec 0.48 sec 0.59 sec
Dataset 2 (Smaller datasets) 2.6 sec 2.8 sec 8.4 sec
Dataset 3 (Large datasets) 22.1 sec 29.9 sec 67.7 sec
Dataset 4 (Large datasets 5 min micro batch) 6.3 min 6.5 min 7.2 min
● Promising results
● Independent pipelines
Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
New challenges and future
● Scale for volume and spikes
● Stream processing and ingestion at
scale
● Faster catch up on failures
● Change Event Log Pipeline to Event
Streaming Pipelines
Follow @TwitterEng
@TwitterCareers
Thank You!
26
Log Ingestion and Replication
Data Platform Infrastructure
Data Platform SRE

More Related Content

What's hot

Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
confluent
 
Symantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in actionSymantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in action
DataStax Academy
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Tugdual Grall
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
confluent
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
kbajda
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
kbajda
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
Flink Forward
 
Druid
DruidDruid
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
arupmalakar
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
MongoDB
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Sriskandarajah Suhothayan
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
Sriskandarajah Suhothayan
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
Alex Van Boxel
 
Big Data Applications
Big Data ApplicationsBig Data Applications
Big Data Applications
Joseph Zwicker
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward
 

What's hot (20)

Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Symantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in actionSymantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in action
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
Druid
DruidDruid
Druid
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Big Data Applications
Big Data ApplicationsBig Data Applications
Big Data Applications
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
 

Similar to Log Events @Twitter

Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdays
SmartNews, Inc.
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
Amazon Web Services
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
Amazon Web Services
 
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
StampedeCon
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
Eduardo Castro
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Keystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architectureKeystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architecture
Zhenzhong Xu
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
Twitter Developers
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
Guido Schmutz
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)
Corneil du Plessis
 

Similar to Log Events @Twitter (20)

Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdays
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Keystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architectureKeystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architecture
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)
 

More from lohitvijayarenu

OpenSource and the Cloud ApacheCon.pptx
OpenSource and the Cloud  ApacheCon.pptxOpenSource and the Cloud  ApacheCon.pptx
OpenSource and the Cloud ApacheCon.pptx
lohitvijayarenu
 
The Adoption of Apache Beam at Twitter
The Adoption of Apache Beam at TwitterThe Adoption of Apache Beam at Twitter
The Adoption of Apache Beam at Twitter
lohitvijayarenu
 
Twitter's Data Replicator for Google Cloud Storage
Twitter's Data Replicator for Google Cloud StorageTwitter's Data Replicator for Google Cloud Storage
Twitter's Data Replicator for Google Cloud Storage
lohitvijayarenu
 
Large Scale EventLog Management @Twitter
Large Scale EventLog Management @TwitterLarge Scale EventLog Management @Twitter
Large Scale EventLog Management @Twitter
lohitvijayarenu
 
Routing trillion events per day @twitter
Routing trillion events per day @twitterRouting trillion events per day @twitter
Routing trillion events per day @twitter
lohitvijayarenu
 
Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at
lohitvijayarenu
 
HBase backups and performance on MapR
HBase backups and performance on MapRHBase backups and performance on MapR
HBase backups and performance on MapR
lohitvijayarenu
 

More from lohitvijayarenu (7)

OpenSource and the Cloud ApacheCon.pptx
OpenSource and the Cloud  ApacheCon.pptxOpenSource and the Cloud  ApacheCon.pptx
OpenSource and the Cloud ApacheCon.pptx
 
The Adoption of Apache Beam at Twitter
The Adoption of Apache Beam at TwitterThe Adoption of Apache Beam at Twitter
The Adoption of Apache Beam at Twitter
 
Twitter's Data Replicator for Google Cloud Storage
Twitter's Data Replicator for Google Cloud StorageTwitter's Data Replicator for Google Cloud Storage
Twitter's Data Replicator for Google Cloud Storage
 
Large Scale EventLog Management @Twitter
Large Scale EventLog Management @TwitterLarge Scale EventLog Management @Twitter
Large Scale EventLog Management @Twitter
 
Routing trillion events per day @twitter
Routing trillion events per day @twitterRouting trillion events per day @twitter
Routing trillion events per day @twitter
 
Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at
 
HBase backups and performance on MapR
HBase backups and performance on MapRHBase backups and performance on MapR
HBase backups and performance on MapR
 

Recently uploaded

ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 

Recently uploaded (20)

ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 

Log Events @Twitter

Editor's Notes

  1. What are events and what does Event Log Pipeline look like? User interactions generate events. These flow throw our Event Log Pipeline which produce datasets These are available for data processing, data analytics
  2. What was our challenge Used to see YoY growth. We had projected to grow from 1B events per minute to 5B events per minute
  3. At the high level architecture Client emit event, More than 100K instance Event Aggregation framework at about 3K instances built on Scribe Once they generate raw dataset, these are further curated and optimized in processor Generating hourly batch
  4. At the high level architecture Client emit event, More than 100K instance Event Aggregation framework at about 3K instances built on Scribe Once they generate raw dataset, these are further curated and optimized in processor Generating hourly batch
  5. At the high level architecture Client emit event, More than 100K instance Event Aggregation framework at about 3K instances built on Scribe Once they generate raw dataset, these are further curated and optimized in processor Generating hourly batch
  6. At the high level architecture Client emit event, More than 100K instance Event Aggregation framework at about 3K instances built on Scribe Once they generate raw dataset, these are further curated and optimized in processor Generating hourly batch
  7. Per aggregator scaling challenge. Stress testing showed that there is a cap on how many events can be handled because of CPU bottleneck There was no multi tenancy. Bad dataset could cause problems to
  8. Planned vs unplanned
  9. Planned vs unplanned