Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

1,365 views

Published on

Shaddy Zeineddine presented Queuing w/ MongoDB & Break Media's API on April 23rd at Factual.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,365
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

  1. 1. Using MongoDB as a Central Queue for Distributed Job Processing
  2. 2. Shaddy Zeineddine <shaddy@chunkofwood.com>Software Developersandbox.chunkofwood.comwww.linkedin.com/in/shaddyzPresented by...Through experiences at...
  3. 3. Web Services API Motives ­ High Level Requirements> I want one search engine for all our properties.> I want to encode videos into a set of formats which are compatible with all browsers and devices.> I want to display and promote related content across properties.> I want to display thumbnails to users as soon as they upload a video.> All new projects must be scalable, load balanced, highly available.
  4. 4. Initial System Design
  5. 5. Why use MongoDB?“Natural fit” ­ Minimal relations between data & native JSON interchange formatSuper simple replicationAwesome PHP client driverGood documentationReadily available support from 10genHigh, scalable performanceDeveloper centricOOP friendly
  6. 6. The RDBMS Schema Problem
  7. 7. Distributed job processing cases> Cron jobs/scheduled tasksJobs are run at predefined times> Callback processingFIFO processing for initial attemptSubsequent attempts processed after variable waiting time> Video encodingPriority based processing, then FIFOMinimize input video transfers> Immediate thumbnail generation from uploaded videosUploaded videos only available on one node
  8. 8. One queue to rule them all: Priority Queue> Priority is defined by the implementation> Worker­aware> Designed to be altered
  9. 9. Processor Daemon Breakdown> “Manager” parent process> starts/stops child processes> listens for signals {SIGTERM, SIGKILL, SIGHUP}> “Worker” child processes> polls queue> processes job> dies after processing
  10. 10. The Queue CollectionDefault document schemaCronjob document schemapublic function enqueue(Job &$job){$this->preEnqueue($job);$jobArray = $job->toArray();$jobArray[_timeQueued] = time();$jobArray[_worker] = $this->getWorker($jobArray);$jobArray[_priority] = $this->getPriority($jobArray);$this->db->insert($jobArray);}> db.cronjobQueue.findOne(){"_id" : ObjectId("517496db1f2c8ad317000000"),"eventId" : "7fad1ff51a00924dd4991a91bb045559","jobName" : "Demo.HelloWorld","jobParams" : "","locked" : 0,"runAt" : 1366600800}
  11. 11. The Queue CollectionDefault dequeuingCronjob dequeuing$jobArray = $this->doDequeueQuery(array(_worker => array($in => array($this->worker, ))),null,null,array(sort => array(_priority => $this->priorityOrder), remove => true));$nextEvent = $this->db->find(array(locked => 0), array(runAt => 1))->sort(array(runAt => 1))->limit(1)->getNext();
  12. 12. Cronjob Processing: A Closer Look> Queue is populated by a static file> Jobs are run at predefined times# Break API Cron Schedule## * * * * * Job Parameters# ┬ ┬ ┬ ┬ ┬ ┬ ┬# │ │ │ │ │ │ └ JSON object of parameters ("{source: mademan}")# │ │ │ │ │ └───── Job name (e.g. "Search.Sync")# │ │ │ │ └──────── day of week (0 - 6) (0 is Sunday, or use names)# │ │ │ └───────────── month (1 - 12)# │ │ └────────────────── day of month (1 - 31)# │ └─────────────────────── hour (0 - 23)# └──────────────────────────── min (0 - 59)## Synchronize & Optimize the Solr index from MongoDB @ 2:16 am*/5 * * * * Search.Sync@daily Search.UpdateFeatures*/10 * * * * Encode.EnqueueJobs*/10 * * * * Encode.UpdateJobPriorities@daily Encode.PurgeOldJobs
  13. 13. Demonstrate the Cronjob processorPlease stand by...
  14. 14. Additional Challenges> A job fails while being processedRe­enqueue incomplete jobs from a secondary collection> A worker is terminated while processing a jobReset all jobs associated to the worker on startup> Providing up­to­date progress information to other nodesMaintain progress in secondary collection
  15. 15. Questions?

×