SlideShare a Scribd company logo
Scalable Real-Time Complex Event
Processing @Uber
Shuyi Chen
Uber Technology Inc.
● 6 continents, 70 countries and 400+ cities
● Transportation as reliable as running water, everywhere, for
everyone
Uber
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Uber is a data-driven company
Thousands of Kafka topics from different services
We can extract a lot of useful information from this rich
set of logs in real-time!
Multiple logins from the same IP in the last 10 minutes
Partner accepted a trip
→ partner calls rider through the Uber APP
→ rider cancels the trip
Partners reject the second pickup of a UberPOOL trip
Multiple logins from the same IP in the last 10 minutes
Window Aggregation
Partner accepted a trip
→ partner calls rider through the Uber APP
→ rider cancels the trip
Pattern detection
Partners reject the second pickup of a UberPOOL trip
Filter
Can we use declarative semantics to specify these stream
processing logics?
Complex event processing
• Combines data from multiple sources to infer events or patterns that
suggest more complicated circumstances
• CEP is used across many industries for various use cases, including:
– Finance: Trade analysis, fraud detection
– Airlines: Operations monitoring
– Healthcare: Claims processing, patient monitoring
– Energy and Telecommunications: Outage detection
• CEP uses declarative rule/query language to specify event processing
logic
WSO2/Siddhi: Complex event processing engine
• Lightweight, extensible, open source, released as a Java library
• Features supported
– Filter
– Join
– Aggregation
– Group by
– Window
– Pattern processing
– Sequence processing
– Event tables
– Event-time processing
– UDF
– Extensions
– Declarative query language: SiddhiQL
How Siddhi works
• Specify processing logic declaratively with SiddhiQL
How Siddhi works
• Query is parsed at runtime into an execution plan runtime
• As events flow in, the execution plan runtime process events inside
the CEP engine according the query logic
How can we make it scalable at Uber scale?
Apache Samza
• A distributed stream processing framework
– Distributed and Scalable
– Built-in State management
– Built-in fault tolerant
– At-least-once message processing
How can we make the stream processing output useful?
Actions
• Generalize a set of common action templates to make it easy for
services and human to harness the power of realtime stream
processing
• Currently we support
– Make an RPC call
– Invoke a Webhook endpoint
– Index to ElasticSearch
– Index to Cassandra
– Kafka
– Statsd
– Chat service
– Email
– Push notification
Actions
Real-time Scalable Complex Event Processing
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Partitioner
• Re-partition events based on key
• Support predicate pushdown through query analysis
• Support column pruning through query analysis (WIP)
Query processor
• Parse Siddhi queries into execution plan runtime
• Process events in Siddhi execution plan runtime
• Checkpoint state regularly to ensure recovery upon crash/restart
using RocksDB
Action processor
• Execute actions upon the complex event processing output
• Support various kinds of actions for easy integration
• Implement action retry mechanism using RocksDB to provide
at-least-once delivery
How do we translate a query into psychical plan that
runs?
DAG (Directed Acyclic Graph) generation
• Analyze Siddhi query to automatically generate the stream
processing DAG in Samza using the processors
Filter, transformation
Join, window, pattern
More complicated
No stream processing logic is hard-coded in any of the
processors
REST API backend
• All queries, actions are stored externally in database.
• RESTFUL API for CRUD operations
• If query/action logic changed
– Redeploy the Samza DAG if needed
– Otherwise, the updated queries/actions will be loaded at runtime w/o
interruption
Unified management and monitoring
• Every use case
– share the same set of processors
– Use queries and actions to describe its processing logic
• A single monitoring template can be reused across different use
cases
Production status
• 100+ production use cases
• 30+ billion messages processed per day
Applications
• Real-time fraud detection
• Real-time anomaly detection
• Real-time marketing campaign
• Real-time promotion
• Real-time monitoring
• Real-time feedback system
• Real-time analytics
• Real-time visualizations
• And etc.
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Out-of-order event handling
• Not a big concern
– Events of the same rider/partner are usually seconds aparts
• K-slack extension in Siddhi for out-of-order event processing
Auto-scaling
• Manually re-partition kafka topics to increase parallelism
• Manually tune container memory if needed
• Future
– Use CPU/memory/IO stats to auto-scale the data pipelines
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Large checkpointing state
• Samza use Kafka to log state changes
• Siddhi engine snapshot can be large
• Kafka message size limit to 1MB by default
• Solution: we build logics to slice state into smaller pieces and
checkpoint them.
Synchronous checkpointing
• If state is large, time to checkpoint can be long
• Samza uses single-threaded model, unsafe to do it asynchronously
(SAMZA-863)
Exactly once state processing?
• Can not commit state and offset atomically
• No exactly once state processing
Custom business logic
• Common logic implemented as Siddhi extensions
• Ad-hoc logic implemented as UDF in javascript or scala
Intermediate Kafka messages
• Samza uses Kafka as message queue for intermediate processing
output
– This can create large load on Kafka if a heave topic is partitioned multiple
times
– Encode the intermediate messages to reduce footprint
Multi-tenancy
• Older Siddhi version process events using a thread pool
– Bad for multi-tenancy in YARN
– Consume more CPU resource than claimed
• Newer version still use thread pool for scheduled task, but main
processing in single thread
– Good: CPU consumption per YARN container is bounded
Upgrading Samza jobs
• Upgrade Samza jobs require a full restart, and can take minutes due
to
– Offset checkpointing topic too large → set retention to hours
– Changelog topic too large → set retention or enable compaction in
Kafka or host affinity (SAMZA-617)
• To minimize the interruption during upgrade, it would be nice to
have
– Rolling restart
– Per container restart
Our solution: non-interrupted handoff
• For critical jobs, we use replication during upgrade
– Start a shadow job
– Upgrade shadow
– Switch primary and shadow
– Upgrade primary
– Switch back
• Downside: require 2x capacity during upgrade
Thank You!

More Related Content

What's hot

WSO2 Product Release Webinar Introducing the WSO2 Message Broker
WSO2 Product Release Webinar   Introducing the WSO2 Message BrokerWSO2 Product Release Webinar   Introducing the WSO2 Message Broker
WSO2 Product Release Webinar Introducing the WSO2 Message BrokerWSO2
 
Keynote-Service Orientation – Why is it good for your business
Keynote-Service Orientation – Why is it good for your businessKeynote-Service Orientation – Why is it good for your business
Keynote-Service Orientation – Why is it good for your businessWSO2
 
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
Lucas Jellema
 
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
PROIDEA
 
Flying to clouds - can it be easy? Cloud Native Applications
Flying to clouds - can it be easy? Cloud Native ApplicationsFlying to clouds - can it be easy? Cloud Native Applications
Flying to clouds - can it be easy? Cloud Native Applications
Jacek Bukowski
 
AWS RDS Oracle - What is missing for a fully managed service?
AWS RDS Oracle - What is missing for a fully managed service?AWS RDS Oracle - What is missing for a fully managed service?
AWS RDS Oracle - What is missing for a fully managed service?
DanielHillinger
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
Gleicon Moraes
 
Cf summit2014 roadmap
Cf summit2014 roadmapCf summit2014 roadmap
Cf summit2014 roadmapJames Bayer
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
Cloud Computing101 Azure, updated june 2017
Cloud Computing101 Azure, updated june 2017Cloud Computing101 Azure, updated june 2017
Cloud Computing101 Azure, updated june 2017
Fernando Mejía
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)
Meni Lubetkin
 
Scalability Availabilty and Management of WSO2 Carbon
Scalability Availabilty and Management of WSO2 CarbonScalability Availabilty and Management of WSO2 Carbon
Scalability Availabilty and Management of WSO2 CarbonWSO2
 
MVC 6 - the new unified Web programming model
MVC 6 - the new unified Web programming modelMVC 6 - the new unified Web programming model
MVC 6 - the new unified Web programming model
Alex Thissen
 
Tokyo azure meetup #9 azure update, october
Tokyo azure meetup #9   azure update, octoberTokyo azure meetup #9   azure update, october
Tokyo azure meetup #9 azure update, october
Tokyo Azure Meetup
 
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, ClouderaLessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
HostedbyConfluent
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
Gleicon Moraes
 
Tokyo Azure Meetup #4 - Build 2016 Overview
Tokyo Azure Meetup #4 -  Build 2016 OverviewTokyo Azure Meetup #4 -  Build 2016 Overview
Tokyo Azure Meetup #4 - Build 2016 Overview
Tokyo Azure Meetup
 
Tokyo Azure Meetup #9 - Azure Update, september
Tokyo Azure Meetup #9 - Azure Update, septemberTokyo Azure Meetup #9 - Azure Update, september
Tokyo Azure Meetup #9 - Azure Update, september
Tokyo Azure Meetup
 
Docker y azure container service
Docker y azure container serviceDocker y azure container service
Docker y azure container service
Fernando Mejía
 
Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and GreatestZach Lanksbury
 

What's hot (20)

WSO2 Product Release Webinar Introducing the WSO2 Message Broker
WSO2 Product Release Webinar   Introducing the WSO2 Message BrokerWSO2 Product Release Webinar   Introducing the WSO2 Message Broker
WSO2 Product Release Webinar Introducing the WSO2 Message Broker
 
Keynote-Service Orientation – Why is it good for your business
Keynote-Service Orientation – Why is it good for your businessKeynote-Service Orientation – Why is it good for your business
Keynote-Service Orientation – Why is it good for your business
 
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
 
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
 
Flying to clouds - can it be easy? Cloud Native Applications
Flying to clouds - can it be easy? Cloud Native ApplicationsFlying to clouds - can it be easy? Cloud Native Applications
Flying to clouds - can it be easy? Cloud Native Applications
 
AWS RDS Oracle - What is missing for a fully managed service?
AWS RDS Oracle - What is missing for a fully managed service?AWS RDS Oracle - What is missing for a fully managed service?
AWS RDS Oracle - What is missing for a fully managed service?
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
Cf summit2014 roadmap
Cf summit2014 roadmapCf summit2014 roadmap
Cf summit2014 roadmap
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
 
Cloud Computing101 Azure, updated june 2017
Cloud Computing101 Azure, updated june 2017Cloud Computing101 Azure, updated june 2017
Cloud Computing101 Azure, updated june 2017
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)
 
Scalability Availabilty and Management of WSO2 Carbon
Scalability Availabilty and Management of WSO2 CarbonScalability Availabilty and Management of WSO2 Carbon
Scalability Availabilty and Management of WSO2 Carbon
 
MVC 6 - the new unified Web programming model
MVC 6 - the new unified Web programming modelMVC 6 - the new unified Web programming model
MVC 6 - the new unified Web programming model
 
Tokyo azure meetup #9 azure update, october
Tokyo azure meetup #9   azure update, octoberTokyo azure meetup #9   azure update, october
Tokyo azure meetup #9 azure update, october
 
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, ClouderaLessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
Tokyo Azure Meetup #4 - Build 2016 Overview
Tokyo Azure Meetup #4 -  Build 2016 OverviewTokyo Azure Meetup #4 -  Build 2016 Overview
Tokyo Azure Meetup #4 - Build 2016 Overview
 
Tokyo Azure Meetup #9 - Azure Update, september
Tokyo Azure Meetup #9 - Azure Update, septemberTokyo Azure Meetup #9 - Azure Update, september
Tokyo Azure Meetup #9 - Azure Update, september
 
Docker y azure container service
Docker y azure container serviceDocker y azure container service
Docker y azure container service
 
Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and Greatest
 

Similar to WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
Shuyi Chen
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Spark cep
Spark cepSpark cep
Spark cep
Byungjin Kim
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
Guide to Application Performance: Planning to Continued Optimization
Guide to Application Performance: Planning to Continued OptimizationGuide to Application Performance: Planning to Continued Optimization
Guide to Application Performance: Planning to Continued Optimization
MuleSoft
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
HostedbyConfluent
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
confluent
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyft
Mian Hamid
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
Felicia Haggarty
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshop
Rory Preddy
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabitsYves Goeleven
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
Amazon Web Services
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...
Pete Siddall
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
Govind Kanshi
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
Govind Kanshi
 

Similar to WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber (20)

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Spark cep
Spark cepSpark cep
Spark cep
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Guide to Application Performance: Planning to Continued Optimization
Guide to Application Performance: Planning to Continued OptimizationGuide to Application Performance: Planning to Continued Optimization
Guide to Application Performance: Planning to Continued Optimization
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyft
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshop
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 

More from WSO2

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
WSO2
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
WSO2
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
WSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
WSO2
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
WSO2
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
WSO2
 
WSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the CloudWSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
WSO2
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2
 

More from WSO2 (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
WSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the CloudWSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the Cloud
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

  • 1. Scalable Real-Time Complex Event Processing @Uber Shuyi Chen Uber Technology Inc.
  • 2. ● 6 continents, 70 countries and 400+ cities ● Transportation as reliable as running water, everywhere, for everyone Uber
  • 3. Outline • Motivation • Architecture • Limitations • Challenges
  • 4. Outline • Motivation • Architecture • Limitations • Challenges
  • 5. Uber is a data-driven company
  • 6. Thousands of Kafka topics from different services
  • 7. We can extract a lot of useful information from this rich set of logs in real-time!
  • 8. Multiple logins from the same IP in the last 10 minutes
  • 9. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip
  • 10. Partners reject the second pickup of a UberPOOL trip
  • 11. Multiple logins from the same IP in the last 10 minutes Window Aggregation
  • 12. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip Pattern detection
  • 13. Partners reject the second pickup of a UberPOOL trip Filter
  • 14. Can we use declarative semantics to specify these stream processing logics?
  • 15. Complex event processing • Combines data from multiple sources to infer events or patterns that suggest more complicated circumstances • CEP is used across many industries for various use cases, including: – Finance: Trade analysis, fraud detection – Airlines: Operations monitoring – Healthcare: Claims processing, patient monitoring – Energy and Telecommunications: Outage detection • CEP uses declarative rule/query language to specify event processing logic
  • 16. WSO2/Siddhi: Complex event processing engine • Lightweight, extensible, open source, released as a Java library • Features supported – Filter – Join – Aggregation – Group by – Window – Pattern processing – Sequence processing – Event tables – Event-time processing – UDF – Extensions – Declarative query language: SiddhiQL
  • 17. How Siddhi works • Specify processing logic declaratively with SiddhiQL
  • 18. How Siddhi works • Query is parsed at runtime into an execution plan runtime • As events flow in, the execution plan runtime process events inside the CEP engine according the query logic
  • 19. How can we make it scalable at Uber scale?
  • 20. Apache Samza • A distributed stream processing framework – Distributed and Scalable – Built-in State management – Built-in fault tolerant – At-least-once message processing
  • 21. How can we make the stream processing output useful?
  • 22. Actions • Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing • Currently we support – Make an RPC call – Invoke a Webhook endpoint – Index to ElasticSearch – Index to Cassandra – Kafka – Statsd – Chat service – Email – Push notification
  • 24. Outline • Motivation • Architecture • Limitations • Challenges
  • 25.
  • 26.
  • 27. Partitioner • Re-partition events based on key • Support predicate pushdown through query analysis • Support column pruning through query analysis (WIP)
  • 28. Query processor • Parse Siddhi queries into execution plan runtime • Process events in Siddhi execution plan runtime • Checkpoint state regularly to ensure recovery upon crash/restart using RocksDB
  • 29. Action processor • Execute actions upon the complex event processing output • Support various kinds of actions for easy integration • Implement action retry mechanism using RocksDB to provide at-least-once delivery
  • 30. How do we translate a query into psychical plan that runs?
  • 31. DAG (Directed Acyclic Graph) generation • Analyze Siddhi query to automatically generate the stream processing DAG in Samza using the processors Filter, transformation
  • 34. No stream processing logic is hard-coded in any of the processors
  • 35. REST API backend • All queries, actions are stored externally in database. • RESTFUL API for CRUD operations • If query/action logic changed – Redeploy the Samza DAG if needed – Otherwise, the updated queries/actions will be loaded at runtime w/o interruption
  • 36. Unified management and monitoring • Every use case – share the same set of processors – Use queries and actions to describe its processing logic • A single monitoring template can be reused across different use cases
  • 37. Production status • 100+ production use cases • 30+ billion messages processed per day
  • 38. Applications • Real-time fraud detection • Real-time anomaly detection • Real-time marketing campaign • Real-time promotion • Real-time monitoring • Real-time feedback system • Real-time analytics • Real-time visualizations • And etc.
  • 39. Outline • Motivation • Architecture • Limitations • Challenges
  • 40. Out-of-order event handling • Not a big concern – Events of the same rider/partner are usually seconds aparts • K-slack extension in Siddhi for out-of-order event processing
  • 41. Auto-scaling • Manually re-partition kafka topics to increase parallelism • Manually tune container memory if needed • Future – Use CPU/memory/IO stats to auto-scale the data pipelines
  • 42. Outline • Motivation • Architecture • Limitations • Challenges
  • 43. Large checkpointing state • Samza use Kafka to log state changes • Siddhi engine snapshot can be large • Kafka message size limit to 1MB by default • Solution: we build logics to slice state into smaller pieces and checkpoint them.
  • 44. Synchronous checkpointing • If state is large, time to checkpoint can be long • Samza uses single-threaded model, unsafe to do it asynchronously (SAMZA-863)
  • 45. Exactly once state processing? • Can not commit state and offset atomically • No exactly once state processing
  • 46. Custom business logic • Common logic implemented as Siddhi extensions • Ad-hoc logic implemented as UDF in javascript or scala
  • 47. Intermediate Kafka messages • Samza uses Kafka as message queue for intermediate processing output – This can create large load on Kafka if a heave topic is partitioned multiple times – Encode the intermediate messages to reduce footprint
  • 48. Multi-tenancy • Older Siddhi version process events using a thread pool – Bad for multi-tenancy in YARN – Consume more CPU resource than claimed • Newer version still use thread pool for scheduled task, but main processing in single thread – Good: CPU consumption per YARN container is bounded
  • 49. Upgrading Samza jobs • Upgrade Samza jobs require a full restart, and can take minutes due to – Offset checkpointing topic too large → set retention to hours – Changelog topic too large → set retention or enable compaction in Kafka or host affinity (SAMZA-617) • To minimize the interruption during upgrade, it would be nice to have – Rolling restart – Per container restart
  • 50. Our solution: non-interrupted handoff • For critical jobs, we use replication during upgrade – Start a shadow job – Upgrade shadow – Switch primary and shadow – Upgrade primary – Switch back • Downside: require 2x capacity during upgrade