Netflix Conductor
A Microservices Orchestrator
Viren Baraiya, Vikram Singh
Content Platform Engineering - CPE
〉 Studio In the Cloud
〉 Content Ingest from Studio Partners
〉 Title Setup - Making it live on Netflix.com
〉 Localization
CPE - Processes (some of many)
〉 Content Ingest & Delivery
〉 Title Setup
〉 IMF*
Deliveries
〉 Encodes and Deployments
〉 Content Quality Control
〉 Content Localization
* IMF - Interoperable Master Format
Once Upon A Time ...
〉 Peer to Peer Messaging
〉 10’s MM messages per day
〉 Process flows embedded in applications
〉 Lack of control (STOP deployment!)
〉 Lack of visibility into progress
Example
Peer to Peer
Application C Application BApplication BApplication A
Request Content Content Inspection Result Encode Publish
Events / API calls Events / API calls Events / API calls
Peer to Peer
Application C Application BApplication BApplication A
Request Content Content Inspection Result Encode Publish
Events / API calls Events / API calls Events / API calls
〉 Logical flow is not easily trackable
〉 Modifying steps is not easy (tightly coupled)
〉 Controlling flow is not possible
〉 Reusing tasks is not trivial
〉 Orchestration Engine
〉 Orchestration Engine
〉 Open Source (Apache 2.0)
Conductor - Design Goals
〉 BYO Task (Reuse existing code)
〉 REST/HTTP support
〉 Extensible and Hackable
〉 JSON based DSL to define blueprint
〉 Scale Out Horizontally
〉 Visibility, Traceability & Control
Same Flow - New Flavor
Request
Content
Content
Inspection
Result Encode PublishStart
Stop
Conductor
Application A
Task
Request
Content
Application B
Task
Content
Inspection
Application C
Task
Encode
Application B
Task
Publish
OrchestrationExecution
Architecture
API
Workflows Metadata Tasks
SERVICE
Workflow Service Task Service
Decider Service Queue Service
STORE
Storage (Dynomite)
Start and manage
workflows
Define blueprints
and tasks
Gets tasks from
queue and execute
Index (Elasticsearch)
Scaling up Conductor
〉 Peer-to-Peer - Scale horizontally
〉 Stateless server - state is persisted in database
〉 Storage scalability : Dynomite
〉 Workload scale: Dyno-Queues
Storage
Dynomite
〉 Generic Dynamo implementation (Redis, Memcache)
〉 Multi-datacenter
〉 Highly available
〉 Peer-to-Peer
Elasticsearch
〉 Indexing workflow and task executions
〉 Verbose logging of worker executions
Dyno-Queues
〉 Distributed lock free queues used by Conductor
〉 OSS
〉 Apache 2.0 License
〉 https://github.com/Netflix/dyno-queues
〉 Delayed Queues
〉 Loose priorities and FIFO
Task
〉 Reusable stateless components
〉 System Tasks (Fork / Join, Decision, etc.)
〉 Remote Workers (Java, Python, lang. agnostic)
Concepts
Concepts
Task
〉 Reusable stateless components
〉 System Tasks (Fork / Join, Decision, etc.)
〉 Remote Workers (Java, Python, lang. agnostic)
Workflow
〉 Sequence of tasks and
〉 Control structure
〉 Input / Output
Tasks Definition
〉 Retries
〉 Timeouts
〉 Documentation for Input / Output
〉 Registry
Task Definition - Example
{
"name": "encode_task",
"retryCount": 2,
"timeoutSeconds": 3600,
"inputKeys": [
"fileLocation",
"encodeRecipie",
"outputLocation"
],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 60,
"responseTimeoutSeconds": 3600
}
Workflow Definition
〉 Logical Flow of Tasks
〉 Error / Failure Handling
〉 Input / Output transformation
〉 JSONPath Based
〉 Versioning
〉 Registry
Workflow - Example
{
"name": "encode_and_deploy",
"description": "Encodes a file and deploys to CDN",
"version": 1,
"tasks": [
{
"name": "encode",
"taskReferenceName": "encode",
"type": "SIMPLE",
"inputParameters": { "fileLocation": "${workflow.input.fileLocation}"}
},
{
"name": "deploy",
"taskReferenceName": "d1",
"type": "SIMPLE",
"inputParameters": { "fileLocation": "${encode.output.encodeLocation}"}
}
],
"outputParameters": {
"cdn_url": "${d1.output.location}"
}
}
Input / Output
〉 Input to tasks are transformed
〉 Refer Input / Outputs from
〉 Task
〉 Workflow
〉 Complex JSON transformations using JSONPath
Example
{
"inputParameters": {
"movieId": "${workflow.input.movieId}",
"url": "${workflow.input.fileLocation}",
"lang": "${loc_task.output.languages[0]}",
"http_request": {
"method": "POST",
"url": "http://example.com/${loc_task.output.fileId}/encode",
"body": {
"recipe": "${workflow.input.recipe}",
"params": {
"width": 100,
"height": 100
}
}
}
}
}
Example
{
"inputParameters": {
"movieId": "${workflow.input.movieId}",
"url": "${workflow.input.fileLocation}",
"lang": "${loc_task.output.languages[0]}",
"http_request": {
"method": "POST",
"url": "http://example.com/${loc_task.output.fileId}/encode",
"body": {
"recipe": "${workflow.input.recipe}",
"params": {
"width": 100,
"height": 100
}
}
}
}
}
Get movieId from workflow Input
Example
{
"inputParameters": {
"movieId": "${workflow.input.movieId}",
"url": "${workflow.input.fileLocation}",
"lang": "${loc_task.output.languages[0]}",
"http_request": {
"method": "POST",
"url": "http://example.com/${loc_task.output.fileId}/encode",
"body": {
"recipe": "${workflow.input.recipe}",
"params": {
"width": 100,
"height": 100
}
}
}
}
}
Get movieId from workflow Input
JSONPath Expressions
Execution Flow
Putting It All Together
Execution Flow
App A
Conductor
App A
Task
Request
Content
Task
Content
Inspection
App C
Task
Encode
App B
Task
Publish
Workflow / Task Service Decider/Queue Service
1. Start content_ingest
workflow
Storage
Task Queues
App B
Execution Flow
App A
Conductor
App A
Task
Request
Content
Task
Content
Inspection
App C
Task
Encode
App B
Task
Publish
Workflow / Task Service Decider/Queue Service
1. Start content_ingest
workflow
2. Get Workflow Definition
Storage
Task Queues
App B
Execution Flow
App A
Conductor
App A
Task
Request
Content
Task
Content
Inspection
App C
Task
Encode
App B
Task
Publish
Workflow / Task Service Decider/Queue Service
1. Start content_ingest
workflow
2. Get Workflow Definition
3. Schedule Task
Storage
Task Queues
App B
Execution Flow
App A
Conductor
App A
Task
Request
Content
Task
Content
Inspection
App C
Task
Encode
App B
Task
Publish
Workflow / Task Service Decider/Queue Service
1. Start content_ingest
workflow
2. Get Workflow Definition
3. Schedule Task
4. Put in Queue
Storage
Task Queues
App B
Execution Flow
App A
Conductor
App A
Task
Request
Content
Task
Content
Inspection
App C
Task
Encode
App B
Task
Publish
Workflow / Task Service Decider/Queue Service
1. Start content_ingest
workflow
2. Get Workflow Definition
3. Schedule Task
4. Put in Queue
5. Poll For task
Storage
Task Queues
App B
Execution Flow
App A
Conductor
App A
Task
Request
Content
Task
Content
Inspection
App C
Task
Encode
App B
Task
Publish
Workflow / Task Service Decider/Queue Service
1. Start content_ingest
workflow
2. Get Workflow Definition
3. Schedule Task
4. Put in Queue
5. Poll For task
6. Execute &
update task status
Storage
Task Queues
App B
Workers
Worker 1
Worker 2
Worker 3
Worker n
...
Management/
Execution Service
Task Queues
Orchestrator
Trigger
Schedule
Task
HTTP
Database
Index
HTTP
Update Task Status
Queue Poll
Features
〉 Conditional (If...Then...Else)
〉 Fork / Join
〉 Dynamic Tasks and Forks
〉 Sub Workflow
〉 Wait
〉 Versioning
〉 HTTP/Service Calls
〉 Input / Output Transformation (JSONPath based)
Blueprint Features
Runtime
〉 Pause | Resume
〉 Skip Tasks, Restart
〉 Error / Failure Handling
〉 UI
〉 Manage definitions
〉 Search Executions by Payload
〉 Visualize flows
〉 Metrics, Metrics, Metrics…
Conductor @ Netflix
〉 In production > 1 year
〉 ~100 Process Flows
〉 ~200 Tasks / Services
〉 Avg. Tasks per workflow: 6
〉 Largest : 48 Tasks
〉 ~4 MM Executions
Roadmap - 2017
〉 Eventing
〉 SQS and SNS
〉 Dyno-Queues
〉 Task Execution Log
〉 Unit Testing Framework
〉 Python Client
〉 Feature Stacks
Questions?
Feedback / Issues / Questions
〉 https://github.com/Netflix/conductor
〉 https://github.com/Netflix/dyno-queues
Contacts
〉 Viren Baraiya <vbaraiya@netflix.com>
〉 Vikram Singh <visingh@netflix.com>
〉 Prosenjit Bhattacharyya <pbhattacharyya@netflix.com>
Thank You

Netflix conductor

  • 1.
    Netflix Conductor A MicroservicesOrchestrator Viren Baraiya, Vikram Singh
  • 2.
    Content Platform Engineering- CPE 〉 Studio In the Cloud 〉 Content Ingest from Studio Partners 〉 Title Setup - Making it live on Netflix.com 〉 Localization
  • 3.
    CPE - Processes(some of many) 〉 Content Ingest & Delivery 〉 Title Setup 〉 IMF* Deliveries 〉 Encodes and Deployments 〉 Content Quality Control 〉 Content Localization * IMF - Interoperable Master Format
  • 4.
    Once Upon ATime ... 〉 Peer to Peer Messaging 〉 10’s MM messages per day 〉 Process flows embedded in applications 〉 Lack of control (STOP deployment!) 〉 Lack of visibility into progress
  • 5.
  • 6.
    Peer to Peer ApplicationC Application BApplication BApplication A Request Content Content Inspection Result Encode Publish Events / API calls Events / API calls Events / API calls
  • 7.
    Peer to Peer ApplicationC Application BApplication BApplication A Request Content Content Inspection Result Encode Publish Events / API calls Events / API calls Events / API calls 〉 Logical flow is not easily trackable 〉 Modifying steps is not easy (tightly coupled) 〉 Controlling flow is not possible 〉 Reusing tasks is not trivial
  • 9.
  • 10.
    〉 Orchestration Engine 〉Open Source (Apache 2.0)
  • 11.
    Conductor - DesignGoals 〉 BYO Task (Reuse existing code) 〉 REST/HTTP support 〉 Extensible and Hackable 〉 JSON based DSL to define blueprint 〉 Scale Out Horizontally 〉 Visibility, Traceability & Control
  • 12.
    Same Flow -New Flavor Request Content Content Inspection Result Encode PublishStart Stop Conductor Application A Task Request Content Application B Task Content Inspection Application C Task Encode Application B Task Publish OrchestrationExecution
  • 13.
    Architecture API Workflows Metadata Tasks SERVICE WorkflowService Task Service Decider Service Queue Service STORE Storage (Dynomite) Start and manage workflows Define blueprints and tasks Gets tasks from queue and execute Index (Elasticsearch)
  • 14.
    Scaling up Conductor 〉Peer-to-Peer - Scale horizontally 〉 Stateless server - state is persisted in database 〉 Storage scalability : Dynomite 〉 Workload scale: Dyno-Queues
  • 15.
    Storage Dynomite 〉 Generic Dynamoimplementation (Redis, Memcache) 〉 Multi-datacenter 〉 Highly available 〉 Peer-to-Peer Elasticsearch 〉 Indexing workflow and task executions 〉 Verbose logging of worker executions
  • 16.
    Dyno-Queues 〉 Distributed lockfree queues used by Conductor 〉 OSS 〉 Apache 2.0 License 〉 https://github.com/Netflix/dyno-queues 〉 Delayed Queues 〉 Loose priorities and FIFO
  • 17.
    Task 〉 Reusable statelesscomponents 〉 System Tasks (Fork / Join, Decision, etc.) 〉 Remote Workers (Java, Python, lang. agnostic) Concepts
  • 18.
    Concepts Task 〉 Reusable statelesscomponents 〉 System Tasks (Fork / Join, Decision, etc.) 〉 Remote Workers (Java, Python, lang. agnostic) Workflow 〉 Sequence of tasks and 〉 Control structure 〉 Input / Output
  • 19.
    Tasks Definition 〉 Retries 〉Timeouts 〉 Documentation for Input / Output 〉 Registry
  • 20.
    Task Definition -Example { "name": "encode_task", "retryCount": 2, "timeoutSeconds": 3600, "inputKeys": [ "fileLocation", "encodeRecipie", "outputLocation" ], "timeoutPolicy": "TIME_OUT_WF", "retryLogic": "FIXED", "retryDelaySeconds": 60, "responseTimeoutSeconds": 3600 }
  • 21.
    Workflow Definition 〉 LogicalFlow of Tasks 〉 Error / Failure Handling 〉 Input / Output transformation 〉 JSONPath Based 〉 Versioning 〉 Registry
  • 22.
    Workflow - Example { "name":"encode_and_deploy", "description": "Encodes a file and deploys to CDN", "version": 1, "tasks": [ { "name": "encode", "taskReferenceName": "encode", "type": "SIMPLE", "inputParameters": { "fileLocation": "${workflow.input.fileLocation}"} }, { "name": "deploy", "taskReferenceName": "d1", "type": "SIMPLE", "inputParameters": { "fileLocation": "${encode.output.encodeLocation}"} } ], "outputParameters": { "cdn_url": "${d1.output.location}" } }
  • 23.
    Input / Output 〉Input to tasks are transformed 〉 Refer Input / Outputs from 〉 Task 〉 Workflow 〉 Complex JSON transformations using JSONPath
  • 24.
    Example { "inputParameters": { "movieId": "${workflow.input.movieId}", "url":"${workflow.input.fileLocation}", "lang": "${loc_task.output.languages[0]}", "http_request": { "method": "POST", "url": "http://example.com/${loc_task.output.fileId}/encode", "body": { "recipe": "${workflow.input.recipe}", "params": { "width": 100, "height": 100 } } } } }
  • 25.
    Example { "inputParameters": { "movieId": "${workflow.input.movieId}", "url":"${workflow.input.fileLocation}", "lang": "${loc_task.output.languages[0]}", "http_request": { "method": "POST", "url": "http://example.com/${loc_task.output.fileId}/encode", "body": { "recipe": "${workflow.input.recipe}", "params": { "width": 100, "height": 100 } } } } } Get movieId from workflow Input
  • 26.
    Example { "inputParameters": { "movieId": "${workflow.input.movieId}", "url":"${workflow.input.fileLocation}", "lang": "${loc_task.output.languages[0]}", "http_request": { "method": "POST", "url": "http://example.com/${loc_task.output.fileId}/encode", "body": { "recipe": "${workflow.input.recipe}", "params": { "width": 100, "height": 100 } } } } } Get movieId from workflow Input JSONPath Expressions
  • 27.
  • 28.
    Execution Flow App A Conductor AppA Task Request Content Task Content Inspection App C Task Encode App B Task Publish Workflow / Task Service Decider/Queue Service 1. Start content_ingest workflow Storage Task Queues App B
  • 29.
    Execution Flow App A Conductor AppA Task Request Content Task Content Inspection App C Task Encode App B Task Publish Workflow / Task Service Decider/Queue Service 1. Start content_ingest workflow 2. Get Workflow Definition Storage Task Queues App B
  • 30.
    Execution Flow App A Conductor AppA Task Request Content Task Content Inspection App C Task Encode App B Task Publish Workflow / Task Service Decider/Queue Service 1. Start content_ingest workflow 2. Get Workflow Definition 3. Schedule Task Storage Task Queues App B
  • 31.
    Execution Flow App A Conductor AppA Task Request Content Task Content Inspection App C Task Encode App B Task Publish Workflow / Task Service Decider/Queue Service 1. Start content_ingest workflow 2. Get Workflow Definition 3. Schedule Task 4. Put in Queue Storage Task Queues App B
  • 32.
    Execution Flow App A Conductor AppA Task Request Content Task Content Inspection App C Task Encode App B Task Publish Workflow / Task Service Decider/Queue Service 1. Start content_ingest workflow 2. Get Workflow Definition 3. Schedule Task 4. Put in Queue 5. Poll For task Storage Task Queues App B
  • 33.
    Execution Flow App A Conductor AppA Task Request Content Task Content Inspection App C Task Encode App B Task Publish Workflow / Task Service Decider/Queue Service 1. Start content_ingest workflow 2. Get Workflow Definition 3. Schedule Task 4. Put in Queue 5. Poll For task 6. Execute & update task status Storage Task Queues App B
  • 34.
    Workers Worker 1 Worker 2 Worker3 Worker n ... Management/ Execution Service Task Queues Orchestrator Trigger Schedule Task HTTP Database Index HTTP Update Task Status Queue Poll
  • 35.
  • 36.
    〉 Conditional (If...Then...Else) 〉Fork / Join 〉 Dynamic Tasks and Forks 〉 Sub Workflow 〉 Wait 〉 Versioning 〉 HTTP/Service Calls 〉 Input / Output Transformation (JSONPath based) Blueprint Features
  • 37.
    Runtime 〉 Pause |Resume 〉 Skip Tasks, Restart 〉 Error / Failure Handling 〉 UI 〉 Manage definitions 〉 Search Executions by Payload 〉 Visualize flows 〉 Metrics, Metrics, Metrics…
  • 38.
    Conductor @ Netflix 〉In production > 1 year 〉 ~100 Process Flows 〉 ~200 Tasks / Services 〉 Avg. Tasks per workflow: 6 〉 Largest : 48 Tasks 〉 ~4 MM Executions
  • 39.
    Roadmap - 2017 〉Eventing 〉 SQS and SNS 〉 Dyno-Queues 〉 Task Execution Log 〉 Unit Testing Framework 〉 Python Client 〉 Feature Stacks
  • 40.
    Questions? Feedback / Issues/ Questions 〉 https://github.com/Netflix/conductor 〉 https://github.com/Netflix/dyno-queues Contacts 〉 Viren Baraiya <vbaraiya@netflix.com> 〉 Vikram Singh <visingh@netflix.com> 〉 Prosenjit Bhattacharyya <pbhattacharyya@netflix.com>
  • 41.