Distributed Queue System using Gearman


Published on

We use Gearman for managing queue system. This covers why we should use a queue in many situations on web-based interface as well as server-side application.

Published in: Technology

Distributed Queue System using Gearman

  1. 1. Distributed Queue System using Gearman Taehun Cho CTO @ IMcompany http://iamcompany.net/
  2. 2. What is a Queue? copyright@ www.mathworks.com
  3. 3. Multiple Queue With multiple teller, all customers must form multiple queue to be served
  4. 4. Some Situation on the WEB+You want to send an email, push notification,messages, import big file, etc...
  5. 5. Problem is that... DO NOT LET YOUR USERS WAIT
  6. 6. Run Asynchronously
  7. 7. Running Task in a Single Thread? Loading behavior may affect the main GUI
  8. 8. Run in Background!The user can do next action, and the task should be done in backend in a distributed manner
  9. 9. Job Queue SystemsCelery (http://www.celeryproject.org/)RabbitMQ (http://www.rabbitmq.com/)Zend Server Job Queue (http://www.zend.com/)ZeroMQ (http://www.zeromq.org/)BeanstalkdPeafowl, StarlingApache ActiveMQand many others..
  10. 10. Introducing Gearman LiveJournal
  11. 11. Introducing GearmanIn LiveJournal, many photos had uploadedevery day and it lead to a heavy load of imageprocessing, and this was a motivation to buildsuch a queue system.● Yahoo!: 120+ servers, 12M jobs/day● Digg: 45+ servers, 400K jobs/day● LiveJournal, SixApart, DealNews, Xing.com, and many others. - Expert PHP and MySQL - Andrew et al, (2010, Wrox)● Grooveshark, GoDaddy.com, IMcompany
  12. 12. Features of Gearman● Open Source● Simple & Fast (rewritten in C)● Support a variety of languages : build Worker in Python, Client in PHP● Flexible● Load Balance● Failover
  13. 13. Example of Architectures (from http://gearman.org/#what_is_gearman)
  14. 14. Architecture Acks the job, finds all sleeping workers Awake, asks for jobs to server Gearman Job ServerConnect, submit a job Sends a noop command to wake them up Client Worker
  15. 15. Installation● Compile (for PHP APIs)tar xzvf gearmand-X.Y.tar. ● Pecl Extensiongz sudo pecl install gearmancd gearmand-X.Y./configure ● Add below to php.inimake extension="gearman.so"make install● Start Server$ gearmand -d
  16. 16. Use Cases- Crawling a website- Image Manipulation- Push Notification- Sending Email/Messages- File verification/compressing- Fetching RSS Feeds- Indexing on Search Engine
  17. 17. Samples - Worker
  18. 18. Samples - Client
  19. 19. Samples - Monitoring A good tool for monitoring gearman, is available at https://github.com/yugene/Gearman-Monitor
  20. 20. Result Worker #1 Worker #2The incomplete job will re-queue to available workers for fault-tolerance
  21. 21. Motivation● At the beginning state, we run 3 computers for crawling each schools information. (articles, schedules of the school)● One job at a time, too much time to finish all of them, sometimes machines do the same job as the others do.● That was a motivation to make a job queue system that could do jobs in parallel. And weve found Gearman!
  22. 22. Gearman in IMcompanyBut there were some challenges!● How many workers should be up for a server? (How efficiently leverage the load?)● How can we handle unexpected termination of workers?● What if the servers resource is exhausted due to the jobs that given by workers? (Then the server would not respond to others requests/connections related to WEB, SVN, MySQL)
  23. 23. Exceptional Case #1
  24. 24. Reported bugs when using PHPBug #63041 "Failed to set exception option" onconnect when any gearman server is downhttps://bugs.php.net/bug.php?id=63041Bug #63648 Gearman worker stops withsegfault after 1-2 hour of workinghttps://bugs.php.net/bug.php?id=63648
  25. 25. Supervisord for sanity"PHP was not built for long running request""Sometimes it occurs memory leaks"Supervisord helps you in above cases!- Auto restart the processes based on customconfigurations* Installation guide - http://www.masnun.com/2011/11/02/gearman-php-and-supervisor-processing-background-jobs-with-sanity.html
  26. 26. Exceptional Case #2 PHP sometimes slows down after hundreds of executions, kill it off if you know this will happen. - Mike Willbanks, "Gearman: A Job Server made for Scale"
  27. 27. Server Seems Fine for Now
  28. 28. What We Learned● Gearmans queue list is unstable so persistent queueing was highly needed in our system● Integrating MySQL with Gearman was failed in both 1.0.2, 0.34● Tried SQLite, but performance was very poorDo NOT Reserve Too Much Jobs in a Queue
  29. 29. Also Weve Tried...● Firing queueing jobs over HTTP request is sometimes not working and may lead to freezing the server eventually● And doesnt support additional functions for the HTTP connection such as authentication● And is not customizableGearman Seems Too Young at This Moment
  30. 30. Limitations● Queue makes no guarantees - use MySQL, memcached, Redis, PostgreSQL, etc..● There are few administration tools● Jobs dont expire● If a job is dropped, the client is never be notified-from "http://inside.godaddy.com/cloud-processing-with-gearman/"
  31. 31. Join Community!http://gearman.org/http://groups.google.com/group/gearman/
  32. 32. Were hiring!● Work in Daejeon, Korea● Flexible, Small Company● Excellent Benefits● We Need Senior HackersFind more information at http://iamcompany.net/Thank you!Any questions?