Workers and Event Processors that Scale
August 2015 Meetup
WHAT? WHY?
• Image processing
• Video encoding
• Data processing
• Report generation
• Order processing
• on and on and on…..
TRADITIONAL METHOD
1. Create & secure page in monolith
2. Setup cron to call URL on schedule or just off-
load process on demand within application.
3. Watch as resources spike. Worry about when to
run tasks so performance is not impacted.
WORKER PATTERNS
• Small independent unit of work
• Think concurrently
• Optimal worker durations
• Scheduling & Quarterbacking
• Pass Id’s, be thin
• Log everything
• Use message queues
SMALL INDEPENDENT UNIT
OF WORK
Keep workers task specific and as light as possible with one purpose.
MUCH easier to maintain and scale.
THINK CONCURRENTLY
Designed as collections of non-interactive workers that may be
executed in parallel.
Independent and self-sufficient.
Loosely coupled, no shared state.
OPTIMALTASK DURATIONS
Long enough to take care of setup, but short enough to quickly fix and
retry on errors.
Too short does not make sense for setup/tear down. If short, batch
several tasks.
Too long increases opportunity for failure and complexity of retry.
SCHEDULING &
QUARTERBACKING
Create a low number of scheduled jobs and queue up lots of
concurrent jobs.
PASS IDS, BETHIN
Pass the minimum amount of information required for the worker/
processor to run. Then retrieve full data while running.
• Better performance while queuing up tasks
• Get latest data
• Much easier to maintain state on that object
LOG EVERYTHING
Log everything using a cloud logger or other global logger.
USE MESSAGE QUEUES
Plan for concurrency issues at scale.
USE MESSAGE QUEUES
Plan for concurrency issues at scale.
USE MESSAGE QUEUES
Plan for concurrency issues at scale.
SOLUTIONS
We are going to focus on cloud solutions
SOLUTIONS
SOLUTIONS
AWS Lambda IronWorker
Event-BasedTriggers Yes Yes
Timeout 60 sec (rounded 100 ms) 1 hour (rounded 1 sec)
StartupTime Milliseconds Few seconds, variable
Languages Node.js & Java All major
Memory Size 64-1536 MB 320 - 2048 MB
Disk Size 512 MB 10 GB
Max Payload Size 6 MB 64 KB
Clouds AWS AWS, Rackspace,Azure, private
Dedicated Clusters No Yes
On-Premise No Yes
Built-in Scheduling No Yes
Simple function editor Yes No
FreeTier Yes 5 concurrent &10 hours
CODE
ADDITIONAL ITEMS
• CLI has additional options to fine tune the way the worker
should process (max-concurrency, retries, retries-delay, delay, timeout, scheduling,
and multiple environments)
• Configuration variables via json or yml files.
• Multiple invoke methods
• CLI
• Webhooks
• Via IronMQ
• REST API
CODE
ADDITIONAL ITEMS
• Deployment automation
• Multiple invoke methods
• Kinesis or DynamoDB stream
• S3 event
• CloudTrail events
• API Gateway
• REST API
• Gotcha:You might need a remote build server.
QUESTIONS?
https://github.com/GrNodeDev/ExchangeRate

Workers and Event processors that Scale

  • 1.
    Workers and EventProcessors that Scale August 2015 Meetup
  • 2.
    WHAT? WHY? • Imageprocessing • Video encoding • Data processing • Report generation • Order processing • on and on and on…..
  • 3.
    TRADITIONAL METHOD 1. Create& secure page in monolith 2. Setup cron to call URL on schedule or just off- load process on demand within application. 3. Watch as resources spike. Worry about when to run tasks so performance is not impacted.
  • 4.
    WORKER PATTERNS • Smallindependent unit of work • Think concurrently • Optimal worker durations • Scheduling & Quarterbacking • Pass Id’s, be thin • Log everything • Use message queues
  • 5.
    SMALL INDEPENDENT UNIT OFWORK Keep workers task specific and as light as possible with one purpose. MUCH easier to maintain and scale.
  • 6.
    THINK CONCURRENTLY Designed ascollections of non-interactive workers that may be executed in parallel. Independent and self-sufficient. Loosely coupled, no shared state.
  • 7.
    OPTIMALTASK DURATIONS Long enoughto take care of setup, but short enough to quickly fix and retry on errors. Too short does not make sense for setup/tear down. If short, batch several tasks. Too long increases opportunity for failure and complexity of retry.
  • 8.
    SCHEDULING & QUARTERBACKING Create alow number of scheduled jobs and queue up lots of concurrent jobs.
  • 9.
    PASS IDS, BETHIN Passthe minimum amount of information required for the worker/ processor to run. Then retrieve full data while running. • Better performance while queuing up tasks • Get latest data • Much easier to maintain state on that object
  • 10.
    LOG EVERYTHING Log everythingusing a cloud logger or other global logger.
  • 11.
    USE MESSAGE QUEUES Planfor concurrency issues at scale.
  • 12.
    USE MESSAGE QUEUES Planfor concurrency issues at scale.
  • 13.
    USE MESSAGE QUEUES Planfor concurrency issues at scale.
  • 14.
    SOLUTIONS We are goingto focus on cloud solutions
  • 15.
  • 16.
  • 18.
    AWS Lambda IronWorker Event-BasedTriggersYes Yes Timeout 60 sec (rounded 100 ms) 1 hour (rounded 1 sec) StartupTime Milliseconds Few seconds, variable Languages Node.js & Java All major Memory Size 64-1536 MB 320 - 2048 MB Disk Size 512 MB 10 GB Max Payload Size 6 MB 64 KB Clouds AWS AWS, Rackspace,Azure, private Dedicated Clusters No Yes On-Premise No Yes Built-in Scheduling No Yes Simple function editor Yes No FreeTier Yes 5 concurrent &10 hours
  • 19.
  • 20.
    ADDITIONAL ITEMS • CLIhas additional options to fine tune the way the worker should process (max-concurrency, retries, retries-delay, delay, timeout, scheduling, and multiple environments) • Configuration variables via json or yml files. • Multiple invoke methods • CLI • Webhooks • Via IronMQ • REST API
  • 21.
  • 22.
    ADDITIONAL ITEMS • Deploymentautomation • Multiple invoke methods • Kinesis or DynamoDB stream • S3 event • CloudTrail events • API Gateway • REST API • Gotcha:You might need a remote build server.
  • 23.