SlideShare a Scribd company logo
1 of 51
Scalable Complex Event Processing On
Samza @Uber
Shuyi Chen
Uber Technologies Inc.
● 6 continents, 70 countries, 400+ cities
● Transportation as reliable as running water, everywhere,
for everyone
Uber
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Uber is a data-driven company
Thousands of Kafka topics from different services
We can extract a lot of useful information from this
rich set of logs in real-time!
Multiple logins from the same IP within a short
interval
Partner accepted a trip
→ partner calls rider through the Uber APP
→ rider cancels the trip
Partners reject the second pickup of a UberPOOL
trip
Multiple logins from the same IP within a short
interval
Window Aggregation
Partner accepted a trip
→ partner calls rider through the Uber APP
→ rider cancels the trip
Pattern detection
Partners reject the second pickup of a UberPOOL
trip
Filter
Can we use declarative semantics to specify these
stream processing logics?
Complex event processing
● Combines data from multiple sources to infer events or patterns that suggest
more complicated circumstances
● CEP is used across many industries for various use cases, including:
○ Finance: Trade analysis, fraud detection
○ Airlines: Operations monitoring
○ Healthcare: Claims processing, patient monitoring
○ Energy and Telecommunications: Outage detection
● CEP uses declarative rule/query language to specify event processing logic
Siddhi: Complex event processing engine
● Lightweight, extensible, open source, released as a Java library
● Features supported
○ Filter
○ Join
○ Aggregation
○ Group by
○ Window
○ Pattern processing
○ Sequence processing
○ Event tables
○ Event-time processing
○ Declarative query language: SiddhiQL
How Siddhi works
● Specify processing logic declaratively with SiddhiQL
How Siddhi works
● Query is parsed at runtime into an execution plan runtime
● As events flow in, the execution plan runtime process events inside the CEP
engine according the query logic
How can we make it scalable at Uber scale?
Samza
● A distributed stream processing framework
○ Scalable
○ Built-in State management
○ Built-in fault tolerant
○ At-least-once message processing
● Good support from our data infra team
How can we make the stream processing output
useful?
Actions
● Generalize a set of common action templates to make it easy for services and
human to harness the power of realtime stream processing
● Currently we support
○ Make an RPC call
○ Invoke a Webhook endpoint
○ Index to ElasticSearch
○ Write Cassandra
○ Kafka
○ Statsd
○ Chat service
○ Email
○ Push notification
Actions
Real-time Scalable Complex Event Processing
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Preprocessor
● Enrich raw Kafka events with business information
Shuffler
● Re-shuffle events
● Prefiltering for predicate pushdown
Complex event processor
● Parse Siddhi queries into execution plan runtime
● Process events in Siddhi execution plan runtime
● Checkpoint state regularly to ensure recovery upon crash/restart using
RocksDB
Action processor
● Execute actions upon the complex event output
● Support various kinds of actions for easy integration
● Implement configurable and finite action retry mechanism using RocksDB
No stream processing logic is hard-coded in the data
pipeline
REST API backend
● All queries, actions, shuffling logics and pre-filtering logics are stored
externally in Cassandra
● RESTFUL API for CRUD operations
● Data pipeline automatically reload the data upon update w/o job restart
○ fast data exploration
○ Realtime feedback loop
○ incremental DAG construction
● Decouple processing logic from the data pipeline
Unified management and monitoring
● Every use case
○ share the same data pipeline architecture
○ Use queries and actions to describe its processing logic
● A single monitoring template can be reused across different use cases
Applications
● Real-time fraud detection
● Real-time anomaly detection
● Real-time marketing campaign
● Real-time promotion
● Real-time monitoring
● Real-time feedback system
● Real-time analytics
● Real-time visualizations
● And etc.
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Not a general purpose stream processing system
No dynamic topology
● The DAG is not dynamic
● Can not shuffle arbitrary number of times
● Ideally, we can chain multiple copies of the data pipeline to build arbitrary
DAG
○ Large DAG can be difficult to manage and monitor
○ Samza use Kafka as intermediate message queue between jobs, wide DAGs cause large load
on Kafka
○ Out of 40+ use cases we run in production, none requires it.
Out-of-order event handling
● Not a big concern
○ Events of the same rider/partner are usually seconds aparts
● K-slack extension in Siddhi for out-of-order event processing
Job deployment
● Samza job creation is semi-automated
○ Auto-generate standard job properties
○ JVM memory tuning
○ Samza parameter tuning, e.g. container count
● Integrate with in-house cluster job management system to simplify
start/restart/stop/upgrade of Samza jobs
Predicate pushdown
● Allow prefiltering of streams in shuffle stage
● Need manual configuration through Web UI
● In the future, we can automate this by query analysis
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Broadcast stream
● We need broadcast stream to broadcast updates in storage backend to the
data pipeline
● No broadcast stream in Samza 0.9.1
● Override SystemStreamPartitionGrouper
● Samza 0.10.0 added broadcast support (SAMZA-676)
Unbalanced task workload
● Shufflers ingest multiple topics with different partition counts
● Default task partition assignment does not scale
● Override SystemStreamPartitionGrouper to balance the partitions across all
tasks
Large checkpointing state
● Samza use Kafka to log state changes
● Kafka message size limit to 1 MB by default
● Solution: we build logics to slice state into smaller pieces and checkpoint
them into Rocksdb
Synchronous checkpointing
● If state is large, time to checkpoint can be long
● Samza uses single-threaded model, unsafe to do it asynchronously
● Ongoing work on multi-thread support in Samza (SAMZA-863)
Exactly once state processing?
● Can not commit state and offset atomically
● No exactly once state processing
Debugging
● Need to inspect multiple logs to diagnose Samza job problems
○ Application master log
○ Multiple container logs
○ Log size is huge
○ Container logs are difficult to locate after job failure
● Sometimes, Samza job get stuck at launch, and no log can be found
○ YARN problem
○ Binary downloading problem
Upgrading Samza jobs
● Upgrade Samza jobs require a full restart, and can take minutes due to
○ Offset checkpointing topic too large → set retention to hours
○ Changelog topic too large → set retention or enable compaction in Kafka or host affinity
(SAMZA-617)
● To minimize the interruption during upgrade, it would be nice to have
○ Rolling restart
○ Per container restart
Our solution: non-interrupted handoff
● For critical jobs, we use replication during upgrade
○ Start a shadow job
○ Upgrade shadow
○ Switch primary and shadow
○ Upgrade primary
○ Switch back
● Downside: require 2x capacity during upgrade
Manage complicated DAG
● Samza uses Kafka as message queue for intermediate processing output
○ This enables sharing of shuffler or preprocessor output among multiple downstream Samza
jobs
○ Increase resource efficiency
● This gradually results in a large and complicated DAG
○ Complicated dependencies between jobs
○ Jobs closer to the sources of the DAG becoming more and more critical
● In practice, we isolate DAGs by logical groups
Thank you

More Related Content

What's hot

Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...LibbySchulze
 
2022년 07월 21일 Confluent+Imply 웨비나 발표자료
2022년 07월 21일 Confluent+Imply 웨비나 발표자료2022년 07월 21일 Confluent+Imply 웨비나 발표자료
2022년 07월 21일 Confluent+Imply 웨비나 발표자료confluent
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uberconfluent
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operatorEui Heo
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Serviceconfluent
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
[NDC17] Kubernetes로 개발서버 간단히 찍어내기
[NDC17] Kubernetes로 개발서버 간단히 찍어내기[NDC17] Kubernetes로 개발서버 간단히 찍어내기
[NDC17] Kubernetes로 개발서버 간단히 찍어내기SeungYong Oh
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst PracticesKonstantin Knauf
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 

What's hot (20)

Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
2022년 07월 21일 Confluent+Imply 웨비나 발표자료
2022년 07월 21일 Confluent+Imply 웨비나 발표자료2022년 07월 21일 Confluent+Imply 웨비나 발표자료
2022년 07월 21일 Confluent+Imply 웨비나 발표자료
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Event-sourced architectures with Akka
Event-sourced architectures with AkkaEvent-sourced architectures with Akka
Event-sourced architectures with Akka
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
[NDC17] Kubernetes로 개발서버 간단히 찍어내기
[NDC17] Kubernetes로 개발서버 간단히 찍어내기[NDC17] Kubernetes로 개발서버 간단히 찍어내기
[NDC17] Kubernetes로 개발서버 간단히 찍어내기
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Service mesh
Service meshService mesh
Service mesh
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Viewers also liked

LinkedIn Mobile: How do we do it?
LinkedIn Mobile: How do we do it?LinkedIn Mobile: How do we do it?
LinkedIn Mobile: How do we do it?phegaro
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
 
Air traffic controller - Streams Processing meetup
Air traffic controller  - Streams Processing meetupAir traffic controller  - Streams Processing meetup
Air traffic controller - Streams Processing meetupEd Yakabosky
 
Bases legales.- 10º Aniversario Centroamérica #destinosIberia
Bases legales.- 10º Aniversario Centroamérica #destinosIberiaBases legales.- 10º Aniversario Centroamérica #destinosIberia
Bases legales.- 10º Aniversario Centroamérica #destinosIberiaIberia
 
Códigos QR
Códigos QRCódigos QR
Códigos QRBglzdiaz
 
My health record1
My health record1My health record1
My health record1BEBESTRUMF1
 
I clienti parte dell'impresa: crowdfunding e crowdsourcing per testare, prom...
I clienti parte dell'impresa: crowdfunding e crowdsourcing  per testare, prom...I clienti parte dell'impresa: crowdfunding e crowdsourcing  per testare, prom...
I clienti parte dell'impresa: crowdfunding e crowdsourcing per testare, prom...ShareableWay
 
Tutorial mind meister
Tutorial mind meisterTutorial mind meister
Tutorial mind meisterValdite
 
Forrest Gump Project
Forrest Gump ProjectForrest Gump Project
Forrest Gump Projectfatpiggy888
 
La Reforma Fiscal, ¿Cómo me afecta?
La Reforma Fiscal, ¿Cómo me afecta?La Reforma Fiscal, ¿Cómo me afecta?
La Reforma Fiscal, ¿Cómo me afecta?Mauricio Priego
 
Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Fabien Coppens
 
Applying complex event processing (2010-10-11)
Applying complex event processing (2010-10-11)Applying complex event processing (2010-10-11)
Applying complex event processing (2010-10-11)Geoffrey De Smet
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperMatthew McCullough
 
Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010Adrian Paschke
 

Viewers also liked (20)

LinkedIn Mobile: How do we do it?
LinkedIn Mobile: How do we do it?LinkedIn Mobile: How do we do it?
LinkedIn Mobile: How do we do it?
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Air traffic controller - Streams Processing meetup
Air traffic controller  - Streams Processing meetupAir traffic controller  - Streams Processing meetup
Air traffic controller - Streams Processing meetup
 
IMCA RESUME
IMCA RESUMEIMCA RESUME
IMCA RESUME
 
Bases legales.- 10º Aniversario Centroamérica #destinosIberia
Bases legales.- 10º Aniversario Centroamérica #destinosIberiaBases legales.- 10º Aniversario Centroamérica #destinosIberia
Bases legales.- 10º Aniversario Centroamérica #destinosIberia
 
Aia 11g-performance-tuning-1915233
Aia 11g-performance-tuning-1915233Aia 11g-performance-tuning-1915233
Aia 11g-performance-tuning-1915233
 
Códigos QR
Códigos QRCódigos QR
Códigos QR
 
Webexpo 2010
Webexpo 2010 Webexpo 2010
Webexpo 2010
 
My health record1
My health record1My health record1
My health record1
 
Musica house
Musica houseMusica house
Musica house
 
I clienti parte dell'impresa: crowdfunding e crowdsourcing per testare, prom...
I clienti parte dell'impresa: crowdfunding e crowdsourcing  per testare, prom...I clienti parte dell'impresa: crowdfunding e crowdsourcing  per testare, prom...
I clienti parte dell'impresa: crowdfunding e crowdsourcing per testare, prom...
 
Tutorial mind meister
Tutorial mind meisterTutorial mind meister
Tutorial mind meister
 
Forrest Gump Project
Forrest Gump ProjectForrest Gump Project
Forrest Gump Project
 
La Reforma Fiscal, ¿Cómo me afecta?
La Reforma Fiscal, ¿Cómo me afecta?La Reforma Fiscal, ¿Cómo me afecta?
La Reforma Fiscal, ¿Cómo me afecta?
 
Resistencia a la insulina
Resistencia a la insulinaResistencia a la insulina
Resistencia a la insulina
 
Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?
 
Applying complex event processing (2010-10-11)
Applying complex event processing (2010-10-11)Applying complex event processing (2010-10-11)
Applying complex event processing (2010-10-11)
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010Semantic Complex Event Processing at Sem Tech 2010
Semantic Complex Event Processing at Sem Tech 2010
 
Porfolio del alumno
Porfolio del alumnoPorfolio del alumno
Porfolio del alumno
 

Similar to Scalable complex event processing on samza @UBER

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceAnant Corporation
 
Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...
Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...
Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...Kaseya
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless ConceptsDhaval Nagar
 
The future of serverless is STATE!
The future of serverless is STATE!The future of serverless is STATE!
The future of serverless is STATE!Ryan Knight
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Dhaval Nagar
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...OdessaJS Conf
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
Deploying Perl apps on dotCloud
Deploying Perl apps on dotCloudDeploying Perl apps on dotCloud
Deploying Perl apps on dotClouddaoswald
 
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...Thoughtworks
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 

Similar to Scalable complex event processing on samza @UBER (20)

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
 
Netty training
Netty trainingNetty training
Netty training
 
Netty training
Netty trainingNetty training
Netty training
 
Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...
Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...
Kaseya Connect 2013: Optimizing Your K Server - Best Practices in Kaseya Infr...
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
 
The future of serverless is STATE!
The future of serverless is STATE!The future of serverless is STATE!
The future of serverless is STATE!
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Deploying Perl apps on dotCloud
Deploying Perl apps on dotCloudDeploying Perl apps on dotCloud
Deploying Perl apps on dotCloud
 
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 

Recently uploaded

RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxMonirHossain707319
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineAftabkhan575376
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationDr. Radhey Shyam
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdftawat puangthong
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoninghotman30312
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineJulioCesarSalazarHer1
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...MohammadAliNayeem
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
E-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentE-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentjatinraor66
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdfKamal Acharya
 

Recently uploaded (20)

RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptx
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
E-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentE-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are present
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 

Scalable complex event processing on samza @UBER

  • 1. Scalable Complex Event Processing On Samza @Uber Shuyi Chen Uber Technologies Inc.
  • 2. ● 6 continents, 70 countries, 400+ cities ● Transportation as reliable as running water, everywhere, for everyone Uber
  • 3. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 4. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 5. Uber is a data-driven company
  • 6. Thousands of Kafka topics from different services
  • 7. We can extract a lot of useful information from this rich set of logs in real-time!
  • 8. Multiple logins from the same IP within a short interval
  • 9. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip
  • 10. Partners reject the second pickup of a UberPOOL trip
  • 11. Multiple logins from the same IP within a short interval Window Aggregation
  • 12. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip Pattern detection
  • 13. Partners reject the second pickup of a UberPOOL trip Filter
  • 14. Can we use declarative semantics to specify these stream processing logics?
  • 15. Complex event processing ● Combines data from multiple sources to infer events or patterns that suggest more complicated circumstances ● CEP is used across many industries for various use cases, including: ○ Finance: Trade analysis, fraud detection ○ Airlines: Operations monitoring ○ Healthcare: Claims processing, patient monitoring ○ Energy and Telecommunications: Outage detection ● CEP uses declarative rule/query language to specify event processing logic
  • 16. Siddhi: Complex event processing engine ● Lightweight, extensible, open source, released as a Java library ● Features supported ○ Filter ○ Join ○ Aggregation ○ Group by ○ Window ○ Pattern processing ○ Sequence processing ○ Event tables ○ Event-time processing ○ Declarative query language: SiddhiQL
  • 17. How Siddhi works ● Specify processing logic declaratively with SiddhiQL
  • 18. How Siddhi works ● Query is parsed at runtime into an execution plan runtime ● As events flow in, the execution plan runtime process events inside the CEP engine according the query logic
  • 19. How can we make it scalable at Uber scale?
  • 20. Samza ● A distributed stream processing framework ○ Scalable ○ Built-in State management ○ Built-in fault tolerant ○ At-least-once message processing ● Good support from our data infra team
  • 21. How can we make the stream processing output useful?
  • 22. Actions ● Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing ● Currently we support ○ Make an RPC call ○ Invoke a Webhook endpoint ○ Index to ElasticSearch ○ Write Cassandra ○ Kafka ○ Statsd ○ Chat service ○ Email ○ Push notification
  • 24. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 25.
  • 26.
  • 27. Preprocessor ● Enrich raw Kafka events with business information
  • 28. Shuffler ● Re-shuffle events ● Prefiltering for predicate pushdown
  • 29. Complex event processor ● Parse Siddhi queries into execution plan runtime ● Process events in Siddhi execution plan runtime ● Checkpoint state regularly to ensure recovery upon crash/restart using RocksDB
  • 30. Action processor ● Execute actions upon the complex event output ● Support various kinds of actions for easy integration ● Implement configurable and finite action retry mechanism using RocksDB
  • 31. No stream processing logic is hard-coded in the data pipeline
  • 32. REST API backend ● All queries, actions, shuffling logics and pre-filtering logics are stored externally in Cassandra ● RESTFUL API for CRUD operations ● Data pipeline automatically reload the data upon update w/o job restart ○ fast data exploration ○ Realtime feedback loop ○ incremental DAG construction ● Decouple processing logic from the data pipeline
  • 33. Unified management and monitoring ● Every use case ○ share the same data pipeline architecture ○ Use queries and actions to describe its processing logic ● A single monitoring template can be reused across different use cases
  • 34. Applications ● Real-time fraud detection ● Real-time anomaly detection ● Real-time marketing campaign ● Real-time promotion ● Real-time monitoring ● Real-time feedback system ● Real-time analytics ● Real-time visualizations ● And etc.
  • 35. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 36. Not a general purpose stream processing system
  • 37. No dynamic topology ● The DAG is not dynamic ● Can not shuffle arbitrary number of times ● Ideally, we can chain multiple copies of the data pipeline to build arbitrary DAG ○ Large DAG can be difficult to manage and monitor ○ Samza use Kafka as intermediate message queue between jobs, wide DAGs cause large load on Kafka ○ Out of 40+ use cases we run in production, none requires it.
  • 38. Out-of-order event handling ● Not a big concern ○ Events of the same rider/partner are usually seconds aparts ● K-slack extension in Siddhi for out-of-order event processing
  • 39. Job deployment ● Samza job creation is semi-automated ○ Auto-generate standard job properties ○ JVM memory tuning ○ Samza parameter tuning, e.g. container count ● Integrate with in-house cluster job management system to simplify start/restart/stop/upgrade of Samza jobs
  • 40. Predicate pushdown ● Allow prefiltering of streams in shuffle stage ● Need manual configuration through Web UI ● In the future, we can automate this by query analysis
  • 41. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 42. Broadcast stream ● We need broadcast stream to broadcast updates in storage backend to the data pipeline ● No broadcast stream in Samza 0.9.1 ● Override SystemStreamPartitionGrouper ● Samza 0.10.0 added broadcast support (SAMZA-676)
  • 43. Unbalanced task workload ● Shufflers ingest multiple topics with different partition counts ● Default task partition assignment does not scale ● Override SystemStreamPartitionGrouper to balance the partitions across all tasks
  • 44. Large checkpointing state ● Samza use Kafka to log state changes ● Kafka message size limit to 1 MB by default ● Solution: we build logics to slice state into smaller pieces and checkpoint them into Rocksdb
  • 45. Synchronous checkpointing ● If state is large, time to checkpoint can be long ● Samza uses single-threaded model, unsafe to do it asynchronously ● Ongoing work on multi-thread support in Samza (SAMZA-863)
  • 46. Exactly once state processing? ● Can not commit state and offset atomically ● No exactly once state processing
  • 47. Debugging ● Need to inspect multiple logs to diagnose Samza job problems ○ Application master log ○ Multiple container logs ○ Log size is huge ○ Container logs are difficult to locate after job failure ● Sometimes, Samza job get stuck at launch, and no log can be found ○ YARN problem ○ Binary downloading problem
  • 48. Upgrading Samza jobs ● Upgrade Samza jobs require a full restart, and can take minutes due to ○ Offset checkpointing topic too large → set retention to hours ○ Changelog topic too large → set retention or enable compaction in Kafka or host affinity (SAMZA-617) ● To minimize the interruption during upgrade, it would be nice to have ○ Rolling restart ○ Per container restart
  • 49. Our solution: non-interrupted handoff ● For critical jobs, we use replication during upgrade ○ Start a shadow job ○ Upgrade shadow ○ Switch primary and shadow ○ Upgrade primary ○ Switch back ● Downside: require 2x capacity during upgrade
  • 50. Manage complicated DAG ● Samza uses Kafka as message queue for intermediate processing output ○ This enables sharing of shuffler or preprocessor output among multiple downstream Samza jobs ○ Increase resource efficiency ● This gradually results in a large and complicated DAG ○ Complicated dependencies between jobs ○ Jobs closer to the sources of the DAG becoming more and more critical ● In practice, we isolate DAGs by logical groups