Job Queues
Queue 
● Allows for asynchronous computation of jobs (or tasks) 
● Uses consumers (or workers) to complete the job in the 
background 
● Results are available when the job is complete
Queue 
● First In First Out data structure (FIFO)
Queue Operations 
● enqueue ➜ adds an item to end of queue 
● dequeue ➜ pulls the oldest item off the queue 
● isEmpty ➜ boolean 
● length ➜ integer (number of items in queue)
Queue Data Structure 
For an unbounded queue, we choose a singly linked list 
with head and tail pointers as the data structure. 
● enqueue - sets current tail next pointer and tail pointer to new item 
● dequeue - returns current head and sets head pointer to head next pointer 
● isEmpty - head/tail is null 
All O(1) operations!
Producers 
Producers push jobs onto the job queue 
Examples: 
● Web servers - A typical HTTP response must return 
within a short timeframe (200ms - 2000ms) 
● Humans phoning into tech support
Consumers 
Consumers pop jobs off of the queue and complete them 
Example use cases (any long running process): 
● Map / reduce calls on large datasets 
● Media conversion, manipulation and rendering 
● Image resize 
● Downloading remote resources 
● CPU intensive tasks (calculations)
Producers and Consumers 
Producers and Consumers can be part of the same 
process! 
Example: a web crawler (breadth first search) 
1. Push a base URL to the queue (e.g. http://yahoo.com/) 
2. Pop a URL from the queue and parse it 
3. For each link the page, push it onto the queue 
4. Goto 2
Job States 
Each job exists in one of the following states: 
● Queued 
● Processing (in progress) 
● Completed 
● Failed 
Jobs may also output: 
● Logs 
● Progress (% complete)
Job Data 
Consumers are functional. The only input they receive 
comes from the job, which comes from the producer. 
Job data should include: 
● Type 
● Any information needed to complete the job
Amdahl’s law... 
...states that the speedup a concurrent algorithm can 
achieve is limited by the serial path. 
Locks and serial parts limit the maximum performance of a 
concurrent system.
Priority Queue 
● Priority ordered Queue data structure 
● Highest priority jobs are dequeued first 
● On the same priority level, oldest jobs are dequeued 
first
Priority Queue Operations 
● enqueue ➜ adds a job to end of queue with a priorty 
● dequeue ➜ pulls the highest priority, oldest job off the 
queue 
● isEmpty ➜ boolean 
● length ➜ integer (number of items in queue)
Priority Queue Data Structure 
● Data structure (max heap) 
● Binary tree with the max heap property (each parent 
node is larger than its children) 
● For a priority queue, each item in the tree would be a 
pointer to a regular queue for that priority 
Enqueue and dequeue O(log n) operations!
Priority Queue Metrics 
● Average wait time per job type 
● Number of queued jobs 
● Jobs processed / time 
● Jobs pushed / time 
Jobs processed / time ≥ Jobs push / time 
Otherwise a backlog forms!
Job Scheduler 
In sophisticated job systems, a job scheduler exists to: 
● Maximize use of computing power 
● Minimize wait time 
● Provide an interface to job tasks 
They can use a combination of priority, estimated 
(historical) job time and available computing power to 
determine how jobs are run. Sophisticated job scheduling 
algorithms exists.
Case Study: Grocery Lines
Case Study: Grocery Lines 
4 consumers, 4 queues, 12 jobs of varying durations 
Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...
Case Study: Grocery Lines 
4 consumers, 1 queue, 12 jobs of varying durations 
Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10) 
Average wait time = (1 + 4 + 6 + 6 + 8 + 8 + 9 + 10) / 12 = 4.3333...
Case Study: Grocery Lines 
4 consumers, 1 queue, 12 jobs of varying durations 
intelligently ordered to minimize wait time: 
Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9) 
Average wait time = (1 + 2 + 3 + 4 + 5 + 6 + 8 + 9) / 12 = 3.1667...
Job Queue Software 
● Beanstalkd (C) http://kr.github.io/beanstalkd/ 
● Celery (Python + many backends) http://www.celeryproject.org/ 
● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job 
● Gearman (C++) http://gearman.org/ 
● Kue (Node + Redis) https://github.com/learnboost/kue 
● Resque (Ruby + Redis) http://resquework.org/ 
● RQ (Python + Redis) http://python-rq.org/ 
● Sidekiq (Ruby) http://sidekiq.org/ 
● SQS by Amazon (managed) http://aws.amazon.com/sqs/ 
More links and information at http://queues.io/

Job Queues Overview

  • 1.
  • 2.
    Queue ● Allowsfor asynchronous computation of jobs (or tasks) ● Uses consumers (or workers) to complete the job in the background ● Results are available when the job is complete
  • 3.
    Queue ● FirstIn First Out data structure (FIFO)
  • 4.
    Queue Operations ●enqueue ➜ adds an item to end of queue ● dequeue ➜ pulls the oldest item off the queue ● isEmpty ➜ boolean ● length ➜ integer (number of items in queue)
  • 5.
    Queue Data Structure For an unbounded queue, we choose a singly linked list with head and tail pointers as the data structure. ● enqueue - sets current tail next pointer and tail pointer to new item ● dequeue - returns current head and sets head pointer to head next pointer ● isEmpty - head/tail is null All O(1) operations!
  • 6.
    Producers Producers pushjobs onto the job queue Examples: ● Web servers - A typical HTTP response must return within a short timeframe (200ms - 2000ms) ● Humans phoning into tech support
  • 7.
    Consumers Consumers popjobs off of the queue and complete them Example use cases (any long running process): ● Map / reduce calls on large datasets ● Media conversion, manipulation and rendering ● Image resize ● Downloading remote resources ● CPU intensive tasks (calculations)
  • 8.
    Producers and Consumers Producers and Consumers can be part of the same process! Example: a web crawler (breadth first search) 1. Push a base URL to the queue (e.g. http://yahoo.com/) 2. Pop a URL from the queue and parse it 3. For each link the page, push it onto the queue 4. Goto 2
  • 9.
    Job States Eachjob exists in one of the following states: ● Queued ● Processing (in progress) ● Completed ● Failed Jobs may also output: ● Logs ● Progress (% complete)
  • 10.
    Job Data Consumersare functional. The only input they receive comes from the job, which comes from the producer. Job data should include: ● Type ● Any information needed to complete the job
  • 11.
    Amdahl’s law... ...statesthat the speedup a concurrent algorithm can achieve is limited by the serial path. Locks and serial parts limit the maximum performance of a concurrent system.
  • 13.
    Priority Queue ●Priority ordered Queue data structure ● Highest priority jobs are dequeued first ● On the same priority level, oldest jobs are dequeued first
  • 14.
    Priority Queue Operations ● enqueue ➜ adds a job to end of queue with a priorty ● dequeue ➜ pulls the highest priority, oldest job off the queue ● isEmpty ➜ boolean ● length ➜ integer (number of items in queue)
  • 15.
    Priority Queue DataStructure ● Data structure (max heap) ● Binary tree with the max heap property (each parent node is larger than its children) ● For a priority queue, each item in the tree would be a pointer to a regular queue for that priority Enqueue and dequeue O(log n) operations!
  • 16.
    Priority Queue Metrics ● Average wait time per job type ● Number of queued jobs ● Jobs processed / time ● Jobs pushed / time Jobs processed / time ≥ Jobs push / time Otherwise a backlog forms!
  • 17.
    Job Scheduler Insophisticated job systems, a job scheduler exists to: ● Maximize use of computing power ● Minimize wait time ● Provide an interface to job tasks They can use a combination of priority, estimated (historical) job time and available computing power to determine how jobs are run. Sophisticated job scheduling algorithms exists.
  • 18.
  • 19.
    Case Study: GroceryLines 4 consumers, 4 queues, 12 jobs of varying durations Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...
  • 20.
    Case Study: GroceryLines 4 consumers, 1 queue, 12 jobs of varying durations Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10) Average wait time = (1 + 4 + 6 + 6 + 8 + 8 + 9 + 10) / 12 = 4.3333...
  • 21.
    Case Study: GroceryLines 4 consumers, 1 queue, 12 jobs of varying durations intelligently ordered to minimize wait time: Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9) Average wait time = (1 + 2 + 3 + 4 + 5 + 6 + 8 + 9) / 12 = 3.1667...
  • 22.
    Job Queue Software ● Beanstalkd (C) http://kr.github.io/beanstalkd/ ● Celery (Python + many backends) http://www.celeryproject.org/ ● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job ● Gearman (C++) http://gearman.org/ ● Kue (Node + Redis) https://github.com/learnboost/kue ● Resque (Ruby + Redis) http://resquework.org/ ● RQ (Python + Redis) http://python-rq.org/ ● Sidekiq (Ruby) http://sidekiq.org/ ● SQS by Amazon (managed) http://aws.amazon.com/sqs/ More links and information at http://queues.io/

Editor's Notes

  • #8 .wav to .mp3
  • #10 .wav to .mp3
  • #11 .wav to .mp3
  • #16 Visualization from http://www.comp.nus.edu.sg/~stevenha/visualization/heap.html Specialized data structures exist at http://en.wikipedia.org/wiki/Priority_queue#Implementation