The Only Workflow Platform
You'll Ever Need
Maxim Fateev
Case Study:
Tips
void addTip(Tip t) {
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
void addTip(Tip t) {
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
void addTip(Tip t) {
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
void OnMessage(Tip t){
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
debitRider(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
creditDriver(t);
}
Queue
void OnMessage(Tip t){
debitRider(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
creditDriver(t);
}
Queue
Status of the
transaction?
void OnMessage(Tip t){
debitRider(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
creditDriver(t);
}
Queue
Status of the
transaction?
void OnMessage(Tip t){
debitRider(t);
updateDB(status);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
creditDriver(t);
updateDB(status);
}
Queue
Database
void OnMessage(Tip t){
debitRider(t);
updateDB(status);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
creditDriver(t);
updateDB(status);
}
Queue
Database
void OnMessage(Tip t){
updateDB(status);
debitRider(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
updateDB(status);
creditDriver(t);
}
Queue
Database
void OnMessage(Tip t){
updateDB(status);
debitRider(t);
}
DebitAccount
CreditAccount
Queue
void OnMessage(Tip t){
updateDB(status);
creditDriver(t);
}
Queue
Database
void addTip(Tip t) {
debitRider(t);
creditDriver(t);
}
DebitAccount
CreditAccount
void addTip(Tip t) {
debitRider(t)
creditDriver(t)
}
DebitRider
CreditDriver
Db
Queue
Queue
Queue
void addTip(Tip t) {
debitRider(t)
creditDriver(t)
}
DebitRider
CreditDriver
Db
Queue
Queue
Queue
void addTip(Tip t) {
debitRider(t)
creditDriver(t)
}
DebitRider
CreditDriver
Db
Queue
Queue
Queue
void addTip(Tip t) {
debitRider(t)
creditDriver(t)
}
DebitRider
CreditDriver
Db
Queue
Queue
Queue
void addTip(Tip t) {
debitRider(t)
creditDriver(t)
}
DebitRider
CreditDriver
Db
Queue
Queue
Queue
Cadence Programming Model
● Activities (aka Tasks)
● Workflows
Cadence Activities
● Any application specific code
● Potentially long lived (heartbeating)
● Can be implemented asynchronously
● Automatically retried according to a specified retry policy
● Routable to specific hosts or processes
● Dispatched through queues
● Per worker rate and parallelism limit
● Per queue rate limit
Cadence Activity
Cadence Workflows
● Virtual Objects in Java or Go
● Transactional
● Orchestrate Activities
● React to external events
● Stateful including local variables and stack
● Queryable
● Potentially Long Lived
● Durable Timers
Cadence Workflows
Cadence Workflows
Cadence Workflows
Cadence Workflows
Activity Retry
Compensation
Case Study: Driver Rewards
● Driver signs up if qualifies
● Eligibility is checked every 30 days
● Participation is lost if doesn’t meet the
rewards requirements when checked
● Service listens on trip completion events to
calculate average rating
void OnMessage(Trip t){
State s = loadFromDb(t.driverId);
s.addTrip(t);
saveToDb(s);
}
PartnerService
Queue
Database
void onTimer(String driverId){
State s = loadFromDb(driverId);
if (s.eligible) {
activate(driverId);
} else {
deactivate(driverId);
}
s.reset();
saveToDB(s);
scheduleTimer();
}
void OnMessage(Trip t){
State s = loadFromDb(t.driverId);
s.addTrip(t);
saveToDb(s);
}
PartnerService
Queue
Database
void onTimer(String driverId){
State s = loadFromDb(driverId);
if (s.eligible) {
activate(driverId);
} else {
deactivate(driverId);
}
s.reset();
saveToDB(s);
scheduleTimer();
}
Driver Rewards Workflow
Driver Rewards Workflow
Driver Rewards Workflow
Driver Rewards Workflow
Driver Rewards Workflow
Driver Rewards Workflow
Driver Rewards Workflow
Use Case: Uber Flow
● UI based workflows
● Graph execution engine.
● Each edge has conditions
attached
● Some state nodes are
associated with actions
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1
Flow Workflow Definition
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
Flow Workflow Instance
Event1
Entity1
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
Flow Event Matching
Event1
Entity1
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
Flow Condition Evaluation
Event1
Entity1
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
PartnerService
PartnerService
PartnerService
PartnerService
Services
Flow Condition Evaluation
Event1
Entity1
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
Flow Condition Evaluation
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
Flow Action
Node1
Node3
Node5
Node2
Node4
condition1
condition2
condition4
condition3
Workflow1 - Entity1
Node Action:
Send email, give
incentive, etc.
Flow Action
● Potentially billions of entities
● ~ 50 workflows per entity
● ~ 10K external events per second
○ not counting duplicates
● Each event should be checked against all workflows for an entity
○ 50 * 10K = 500K condition evaluation per second
● On average one workflow per event generate an action
○ ~ 10K actions
Flow Scalability Requirements
void OnMessage(Message t){
List<String> workflows = getWorkflowsFor(t);
for (String workflow: workflows) {
State s = loadFromDb(workflow, t.entityId);
if (s.currentNode.evaluateConditions(t)) {
s.currentNode.executeAction();
saveToDb(s);
}
}
}
Services
Queue
Database
Services
Services
Services
Flow Original Implementation
void OnMessage(Message t){
List<String> workflows = getWorkflowsFor(t);
for (String workflow: workflows) {
State s = loadFromDb(workflow, t.entityId);
if (s.currentNode.evaluateConditions(t)) {
s.currentNode.executeAction();
saveToDb(s);
}
}
}
Services
Queue
Database
Services
Services
Services
Flow Original Implementation
List<State> workflows;
void OnMessage(Message t){
for (State workflow: workflows) {
if (s.currentNode.evaluateConditions(t)) {
s.currentNode.executeAction();
}
}
}
Services
Queue Services
Services
Services
Flow on Cadence
Flow on Cadence Advantages
● Reliable retries of condition evaluations
● Reliable retries of actions
● Database load is proportional to number of events per second.
○ Not (number of events) x (number of workflows per entity) which is 500K
● Cross datacenter failover
● Unified workflow engine
Flow Load Test
7500 events per second ingested
Flow Load Test
● 218000 per second condition evaluations
● 4400 actions per second
Cadence as a Platform
Cadence Service
BPMN
AWS States
Language
Airflow
DAG
Uber
Flow
Custom
DSL2
Java SDK Go SDK
JavApp1
JavaApp2
BPMNApp1
BPMNApp2
SLApp1
SLApp2
SLApp3
SLApp4
GoApp1
GoApp2
App1
App1
AirflowApp1
AirflowApp2
AirflowApp3
App1App1App1
DSL1App1
DSL1App2
DSL3App1
App1
App1
FlowApp1
FlowApp2
Custom
DSL1
Custom
DSL3
App1App1App1
DSL2App1
DSL2App2
More Uber Cadence Use Cases
● Freight load workflow
● Driver loyalty program
● Customer support workflows
● CI/CD/Deployment infrastructure
● End of month statement generation for each u4b customer
● Recalculate every hexagon on the city map every 1 minute for every city
● Tip processing in microservices architecture
● Managing Flink and Spark Jobs in Mesos or Yarn
● Customer loyalty Program
● Marketing email campaign management
● New datacenter provisioning
● Numerous other periodic jobs
Distributed Application Building Blocks
● Request Handlers
○ Microservices
○ Serverless
○ Actors
● Storage
○ Databases
○ Caches
● Queues
● Job Schedulers
● Consensus
○ Leader Election
○ Sharding
○ Distributed Locks
Distributed Application Building Blocks
● Request Handlers
○ Microservices
○ Serverless
○ Actors
● Storage
○ Databases
○ Caches
● Queues
● Job Schedulers
● Consensus
○ Leader Election
○ Sharding
○ Distributed Locks
Cadence Summary
● Higher level way of building distributed applications
○ Focus on business logic not plumbing
● Large scale
○ Billions of workflow instances
○ Tens of thousands of events per second
● High availability
○ Oblivious to node failures
○ Cross Datacenter Replication
● Unify all workflow solutions
○ Can support any existing workflow definition language
○ Perfect for DSL
● Open Source
○ http://cadenceworkflow.io
○ Apache 2.0 License

Cadence: The Only Workflow Platform You'll Ever Need

Editor's Notes

  • #17 Other Issues to Consider Timeouts What if debit service lost the transaction? What if settlement has a time limit? Compensations What if credit is impossible? Changes to already running transaction Tip amount updated Cancellation What if long running operation requires polling for the result? Upgrading the sequence of steps Operations Many moving parts like DB, queue, etc. Datacenter failures Debugging
  • #33 Change to business exception
  • #35 Other Issues to Consider Timeouts What if debit service lost the transaction? What if settlement has a time limit? Compensations What if credit is impossible? Changes to already running transaction Tip amount updated Cancellation What if long running operation requires polling for the result? Upgrading the sequence of steps Operations Many moving parts like DB, queue, etc. Datacenter failures Debugging
  • #36 Other Issues to Consider Timeouts What if debit service lost the transaction? What if settlement has a time limit? Compensations What if credit is impossible? Changes to already running transaction Tip amount updated Cancellation What if long running operation requires polling for the result? Upgrading the sequence of steps Operations Many moving parts like DB, queue, etc. Datacenter failures Debugging
  • #44 Copy from https://engdocs.uberinternal.com/autobots/overview.html#product-details
  • #54 Other Issues to Consider Timeouts What if debit service lost the transaction? What if settlement has a time limit? Compensations What if credit is impossible? Changes to already running transaction Tip amount updated Cancellation What if long running operation requires polling for the result? Upgrading the sequence of steps Operations Many moving parts like DB, queue, etc. Datacenter failures Debugging
  • #55 Other Issues to Consider Timeouts What if debit service lost the transaction? What if settlement has a time limit? Compensations What if credit is impossible? Changes to already running transaction Tip amount updated Cancellation What if long running operation requires polling for the result? Upgrading the sequence of steps Operations Many moving parts like DB, queue, etc. Datacenter failures Debugging
  • #56 Other Issues to Consider Timeouts What if debit service lost the transaction? What if settlement has a time limit? Compensations What if credit is impossible? Changes to already running transaction Tip amount updated Cancellation What if long running operation requires polling for the result? Upgrading the sequence of steps Operations Many moving parts like DB, queue, etc. Datacenter failures Debugging
  • #58 Copy from https://engdocs.uberinternal.com/autobots/overview.html#product-details
  • #59 Copy from https://engdocs.uberinternal.com/autobots/overview.html#product-details