SlideShare a Scribd company logo
Scalable Real-Time Complex Event Processing @Uber
Shuyi Chen
Uber Technology Inc.
6 continents, 70+ countries and 400+ cities
Transportation as reliable as running water, everywhere,
for everyone
Uber
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Uber is a data-driven company
Thousands of Kafka topics from micro-services
We can extract a lot of useful information from this
rich set of logs in real-time!
Multiple logins from the same IP in the last 10
minutes
Partner accepted a trip
→ partner calls rider through the Uber APP
→ rider cancels the trip
Partners reject the second pickup of a UberPOOL
trip
Multiple logins from the same IP in the last 10
minutes
Window Aggregation
Partner accepted a trip
→ partner calls rider through the Uber APP
→ rider cancels the trip
Pattern detection
Partners reject the second pickup of a UberPOOL
trip
Filter
Can we use declarative languages to specify these
stream processing logics?
Complex event processing
● Combines data from multiple sources to infer events or patterns that suggest
more complicated circumstances
● CEP is used across many industries for various use cases, including:
○ Finance: Trade analysis, fraud detection
○ Airlines: Operations monitoring
○ Healthcare: Claims processing, patient monitoring
○ Energy and Telecommunications: Outage detection
● CEP uses declarative rule/query language to specify event processing logic
WSO2/Siddhi: Complex event processing engine
● Lightweight, extensible, open source, released as a Java library
● Features supported
○ Filter
○ Join
○ Aggregation
○ Group by
○ Window
○ Pattern processing
○ Sequence processing
○ Event tables
○ Event-time processing
○ UDF
○ Extensions
○ Declarative query language: SiddhiQL
How Siddhi works
● Specify processing logic declaratively with SiddhiQL
How Siddhi works
● Query is parsed at runtime into an execution plan runtime
● As events flow in, the execution plan runtime process events inside the CEP
engine according the query logic
How can we make it scalable at Uber scale?
Apache Samza
● A distributed stream processing framework
○ Distributed and Scalable
○ Built-in State management
○ Built-in fault tolerant
○ At-least-once message processing
○ Infrastructure support at Uber
How can we make the stream processing output
useful?
Actions
● Generalize a set of common action templates to make it easy for
micro-services and human to harness the power of realtime stream
processing
● Currently we support
○ Make an RPC call
○ Invoke a Webhook endpoint
○ Index to ElasticSearch
○ Index to Cassandra
○ Kafka
○ Statsd
○ Chat service
○ Email
○ Push notification
Actions
Real-time Scalable Complex Event Processing
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Partitioner
● Re-shuffle events based on key
● Support predicate pushdown through query analysis
● Support column pruning through query analysis (WIP)
Query processor
● Parse Siddhi queries into execution plan runtime
● Process events in Siddhi execution plan runtime
● Checkpoint state regularly to ensure recovery upon crash/restart using
RocksDB
Action processor
● Execute actions upon the query processing output
● Support various kinds of actions for easy integration
● Implement action retry mechanism using RocksDB to provide at-least-once
delivery
How do we translate a query into psychical plan that
runs?
DAG (Directed Acyclic Graph) generation
● Analyze Siddhi query to automatically generate the stream processing DAG in
Samza using the processors
Filter, transformation
Join, window, pattern
More complicated
No stream processing logic is hard-coded in any of
the processors
REST API backend
● All queries, actions are stored externally in database.
● RESTFUL API for CRUD operations
● If query/action logic changed
○ Redeploy the Samza DAG if needed
○ Otherwise, the updated queries/actions will be loaded at runtime w/o interruption
Unified management and monitoring
● Every use case
○ share the same set of processors
○ Use queries and actions to describe its processing logic
● A single monitoring template can be reused across different use cases
Production status
● In production for >1.5 years
● 120+ production use cases
● 30+ billion messages processed per day
Applications
● Real-time fraud detection
● Real-time anomaly detection
● Real-time marketing campaign
● Real-time promotion
● Real-time monitoring
● Real-time feedback system
● Real-time analytics
● Real-time visualizations
● And etc.
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Out-of-order event handling
● Not a big concern
○ Events of the same rider/partner are usually seconds aparts
● K-slack extension in Siddhi for out-of-order event processing
Auto-scaling
● Manually re-partition kafka topics to increase parallelism
● Manually tune container memory if needed
● Future
○ Use CPU/memory/IO stats to automate the process
Outline
● Motivation
● Architecture
● Limitations
● Challenges
Large checkpointing state
● Samza use Kafka to log state changes
● Siddhi engine snapshot can be large
● Kafka message size limit to 1MB by default
● Solution: we build logics to slice state into smaller pieces and checkpoint
them.
Synchronous checkpointing
● Samza checkpointing is synchronous with message processing
● If state is large, time to checkpoint can be long, might cause processing lag
● Incremental state checkpointing
Exactly once state processing?
● Can not commit state and offset atomically
● No exactly once state processing
Custom business logic
● Common logic implemented as Siddhi extensions
● Ad-hoc logic implemented as UDF in javascript or scalascript inline with the
query
Intermediate Kafka messages
● Samza uses Kafka as message queue for intermediate processing output
○ Each stage is independent of each other
○ This can create large load on Kafka if a heave topic is re-shuffled multiple times
■ Encode the intermediate messages to reduce footprint
Upgrading Samza jobs
● Upgrade Samza jobs require a full restart, and can take minutes due to
○ Offset checkpointing topic too large → set retention to hours or enable compaction
○ Changelog topic too large → set retention or enable compaction in Kafka or host affinity
● To minimize the interruption during upgrade, it would be nice to have
○ Rolling restart
○ Per container restart
Our solution: non-interrupted handoff
● For critical jobs, we use replication during upgrade
○ Start a shadow job
○ Upgrade shadow
○ Switch primary and shadow
○ Upgrade primary
○ Switch back
● Downside: require 2x capacity during upgrade
Thank You!

More Related Content

What's hot

Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka AggregationKafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
confluent
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
confluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
confluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
Eric Sammer
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
confluent
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
confluent
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
confluent
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
confluent
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
HostedbyConfluent
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
Hakka Labs
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
Vinoth Chandar
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
Shuyi Chen
 
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain PlanningKafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
confluent
 
Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform
confluent
 
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + HotstarHow Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
HostedbyConfluent
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 

What's hot (20)

Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka AggregationKafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
Via Varejo taking data from legacy to a new world at Brazil Black Friday (Mar...
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain PlanningKafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
 
Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform
 
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + HotstarHow Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 

Similar to Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2
 
Netty training
Netty trainingNetty training
Netty training
Marcelo Serpa
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
Anant Corporation
 
Netty training
Netty trainingNetty training
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Bowen Li
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
Amazon Web Services
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
confluent
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...
OdessaJS Conf
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
Digital Vidya
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
 

Similar to Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber (20)

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Netty training
Netty trainingNetty training
Netty training
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
 
Netty training
Netty trainingNetty training
Netty training
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...'How to build efficient backend based on microservice architecture' by Anton ...
'How to build efficient backend based on microservice architecture' by Anton ...
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 

More from confluent

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
confluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
confluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
confluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
confluent
 

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 

Recently uploaded (20)

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber

  • 1. Scalable Real-Time Complex Event Processing @Uber Shuyi Chen Uber Technology Inc.
  • 2. 6 continents, 70+ countries and 400+ cities Transportation as reliable as running water, everywhere, for everyone Uber
  • 3. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 4. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 5. Uber is a data-driven company
  • 6. Thousands of Kafka topics from micro-services
  • 7. We can extract a lot of useful information from this rich set of logs in real-time!
  • 8. Multiple logins from the same IP in the last 10 minutes
  • 9. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip
  • 10. Partners reject the second pickup of a UberPOOL trip
  • 11. Multiple logins from the same IP in the last 10 minutes Window Aggregation
  • 12. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip Pattern detection
  • 13. Partners reject the second pickup of a UberPOOL trip Filter
  • 14. Can we use declarative languages to specify these stream processing logics?
  • 15. Complex event processing ● Combines data from multiple sources to infer events or patterns that suggest more complicated circumstances ● CEP is used across many industries for various use cases, including: ○ Finance: Trade analysis, fraud detection ○ Airlines: Operations monitoring ○ Healthcare: Claims processing, patient monitoring ○ Energy and Telecommunications: Outage detection ● CEP uses declarative rule/query language to specify event processing logic
  • 16. WSO2/Siddhi: Complex event processing engine ● Lightweight, extensible, open source, released as a Java library ● Features supported ○ Filter ○ Join ○ Aggregation ○ Group by ○ Window ○ Pattern processing ○ Sequence processing ○ Event tables ○ Event-time processing ○ UDF ○ Extensions ○ Declarative query language: SiddhiQL
  • 17. How Siddhi works ● Specify processing logic declaratively with SiddhiQL
  • 18. How Siddhi works ● Query is parsed at runtime into an execution plan runtime ● As events flow in, the execution plan runtime process events inside the CEP engine according the query logic
  • 19. How can we make it scalable at Uber scale?
  • 20. Apache Samza ● A distributed stream processing framework ○ Distributed and Scalable ○ Built-in State management ○ Built-in fault tolerant ○ At-least-once message processing ○ Infrastructure support at Uber
  • 21. How can we make the stream processing output useful?
  • 22. Actions ● Generalize a set of common action templates to make it easy for micro-services and human to harness the power of realtime stream processing ● Currently we support ○ Make an RPC call ○ Invoke a Webhook endpoint ○ Index to ElasticSearch ○ Index to Cassandra ○ Kafka ○ Statsd ○ Chat service ○ Email ○ Push notification
  • 24. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 25.
  • 26.
  • 27. Partitioner ● Re-shuffle events based on key ● Support predicate pushdown through query analysis ● Support column pruning through query analysis (WIP)
  • 28. Query processor ● Parse Siddhi queries into execution plan runtime ● Process events in Siddhi execution plan runtime ● Checkpoint state regularly to ensure recovery upon crash/restart using RocksDB
  • 29. Action processor ● Execute actions upon the query processing output ● Support various kinds of actions for easy integration ● Implement action retry mechanism using RocksDB to provide at-least-once delivery
  • 30. How do we translate a query into psychical plan that runs?
  • 31. DAG (Directed Acyclic Graph) generation ● Analyze Siddhi query to automatically generate the stream processing DAG in Samza using the processors Filter, transformation
  • 34. No stream processing logic is hard-coded in any of the processors
  • 35. REST API backend ● All queries, actions are stored externally in database. ● RESTFUL API for CRUD operations ● If query/action logic changed ○ Redeploy the Samza DAG if needed ○ Otherwise, the updated queries/actions will be loaded at runtime w/o interruption
  • 36. Unified management and monitoring ● Every use case ○ share the same set of processors ○ Use queries and actions to describe its processing logic ● A single monitoring template can be reused across different use cases
  • 37. Production status ● In production for >1.5 years ● 120+ production use cases ● 30+ billion messages processed per day
  • 38. Applications ● Real-time fraud detection ● Real-time anomaly detection ● Real-time marketing campaign ● Real-time promotion ● Real-time monitoring ● Real-time feedback system ● Real-time analytics ● Real-time visualizations ● And etc.
  • 39. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 40. Out-of-order event handling ● Not a big concern ○ Events of the same rider/partner are usually seconds aparts ● K-slack extension in Siddhi for out-of-order event processing
  • 41. Auto-scaling ● Manually re-partition kafka topics to increase parallelism ● Manually tune container memory if needed ● Future ○ Use CPU/memory/IO stats to automate the process
  • 42. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  • 43. Large checkpointing state ● Samza use Kafka to log state changes ● Siddhi engine snapshot can be large ● Kafka message size limit to 1MB by default ● Solution: we build logics to slice state into smaller pieces and checkpoint them.
  • 44. Synchronous checkpointing ● Samza checkpointing is synchronous with message processing ● If state is large, time to checkpoint can be long, might cause processing lag ● Incremental state checkpointing
  • 45. Exactly once state processing? ● Can not commit state and offset atomically ● No exactly once state processing
  • 46. Custom business logic ● Common logic implemented as Siddhi extensions ● Ad-hoc logic implemented as UDF in javascript or scalascript inline with the query
  • 47. Intermediate Kafka messages ● Samza uses Kafka as message queue for intermediate processing output ○ Each stage is independent of each other ○ This can create large load on Kafka if a heave topic is re-shuffled multiple times ■ Encode the intermediate messages to reduce footprint
  • 48. Upgrading Samza jobs ● Upgrade Samza jobs require a full restart, and can take minutes due to ○ Offset checkpointing topic too large → set retention to hours or enable compaction ○ Changelog topic too large → set retention or enable compaction in Kafka or host affinity ● To minimize the interruption during upgrade, it would be nice to have ○ Rolling restart ○ Per container restart
  • 49. Our solution: non-interrupted handoff ● For critical jobs, we use replication during upgrade ○ Start a shadow job ○ Upgrade shadow ○ Switch primary and shadow ○ Upgrade primary ○ Switch back ● Downside: require 2x capacity during upgrade