Gearman and asynchronous processing in PHP applications

9,031 views
8,721 views

Published on

Presentation at BarcampSaigon 2010

Published in: Technology

Gearman and asynchronous processing in PHP applications

  1. 1. Gearman and asynchronous processing in PHP applications Pham Cong Dinh (a.k.a pcdinh) @pcdinh on Twitter BarCampSaiGon 2010 Skunkworks @teamskunkworks on Twitter
  2. 2. The aim of my talk Discuss about a solution that helps scale your high traffic PHP web applications
  3. 3. Introduction <ul><li>PHP developer since 2002. 8 years in PHP development and counting </li></ul><ul><li>Presenter at Hanoi PHP Day in 2008, 2009 </li></ul><ul><li>Founder and maintainer of PHPVietnam mailing list (Google Group) since 2004 </li></ul><ul><li>Very interested in Linux, server farm, big data, database, distributed processing, scalability, high performance web systems </li></ul><ul><li>Involved in clip.vn development at Vega Corporation 1 year ago </li></ul><ul><li>Software developer at Skunkworks </li></ul>
  4. 4. Agenda <ul><li>Challenges in developing large scale PHP applications for high traffic web sites </li></ul><ul><li>Resolve the challenge: How to distribute workload </li></ul><ul><li>Gearman: an open source high performance job server </li></ul><ul><li>Develop PHP clients and workers </li></ul><ul><li>Challenges in managing workers – a case study of Gearman Agent Manager </li></ul>
  5. 5. <ul><li>What is large scale? </li></ul><ul><li>How high is high traffic? </li></ul>Challenges in developing large scale PHP applications for high traffic web sites (1)
  6. 6. Large Scale? Challenges in developing large scale PHP applications for high traffic web sites (2) Traffic Data graph Storage Code base Development team
  7. 7. <ul><li>Typical challenges: limitation of resources </li></ul><ul><ul><li>CPU </li></ul></ul><ul><ul><li>Disk speed </li></ul></ul><ul><ul><li>Memory </li></ul></ul><ul><ul><li>Bandwidth: router, NIC </li></ul></ul><ul><ul><li>Architecture: application and system </li></ul></ul>Challenges in developing large scale PHP applications for high traffic web sites (3)
  8. 8. <ul><li>Major challenges </li></ul><ul><ul><li>No preparation for growth </li></ul></ul><ul><ul><li>No idea on how to scale your application at a certain extent </li></ul></ul><ul><ul><li>No in-depth understanding of your system </li></ul></ul><ul><ul><li>No proper system capacity monitoring </li></ul></ul><ul><ul><li>Lack of proper skills </li></ul></ul>Challenges in developing large scale PHP applications for high traffic web sites (4)
  9. 9. Our challenge today Resolve the challenge: How to distribute workload (1) TOO MUCH WORKLOAD FOR A SINGLE SERVER
  10. 10. <ul><li>Many solutions </li></ul><ul><ul><li>Load balancing: </li></ul></ul><ul><ul><ul><li>Hardware: F5, Cisco Content Services Switch </li></ul></ul></ul><ul><ul><ul><li>Software: Bind, LVS, HAProxy, Varnish ... </li></ul></ul></ul><ul><ul><li>Precalculate data </li></ul></ul><ul><ul><li>Multi-tier application architecture </li></ul></ul>Resolve the challenge: How to distribute workload (2)
  11. 11. <ul><li>Our solution today </li></ul><ul><ul><li>Queue up the workload </li></ul></ul><ul><ul><li>Categorize workload pattern </li></ul></ul><ul><ul><li>Optimize processing model, security </li></ul></ul><ul><ul><li>Job server </li></ul></ul>Resolve the challenge: How to distribute workload (3)
  12. 12. <ul><li>Is queuing the final answer? </li></ul><ul><ul><li>Keep up with peak workload? </li></ul></ul><ul><ul><li>Handle backlog gracefully </li></ul></ul>Resolve the challenge: How to distribute workload (4)
  13. 13. <ul><li>Concepts </li></ul><ul><ul><li>Synchronous and asynchronous </li></ul></ul><ul><ul><li>Job, job queue and job server </li></ul></ul><ul><li>Who </li></ul><ul><li>Used at LiveJournal, Yahoo!, Digg, BackType and many more </li></ul><ul><li>Used at Vega (clip.vn, vega.com.vn) for sending mails. </li></ul><ul><li>At Skunkworks? </li></ul>Gearman: an open source high performance job server (1)
  14. 14. <ul><li>Architecture </li></ul><ul><ul><li>Client </li></ul></ul><ul><ul><li>Worker </li></ul></ul><ul><ul><li>Job server </li></ul></ul>Gearman: an open source high performance job server (2) Fail-over cluster
  15. 15. <ul><li>Features </li></ul><ul><ul><li>Fast </li></ul></ul><ul><ul><li>Programming language neutral </li></ul></ul><ul><ul><li>A bridge between a message queue server and a pub/sub engine </li></ul></ul><ul><ul><li>Enables applications to outsource tasks to other servers in a synchronous or asynchronous manner </li></ul></ul><ul><ul><li>Fault-tolerant </li></ul></ul><ul><ul><li>Poison message and retries </li></ul></ul><ul><ul><li>Persistent queues for background jobs </li></ul></ul><ul><ul><li>Timeout </li></ul></ul>Gearman: an open source high performance job server (3)
  16. 16. <ul><li>How it works </li></ul><ul><li>Worker </li></ul><ul><li>worker connects to all gearmand servers. </li></ul><ul><li>worker registers what functions it supports. </li></ul><ul><li>worker asks for jobs. </li></ul><ul><li>if no jobs, sends command 'pre_sleep' to all gearmand's and sleeps. </li></ul><ul><li>Client </li></ul><ul><li>connect to gearmand. </li></ul><ul><li>submit a job for a particular job name </li></ul><ul><li>Gearmand </li></ul><ul><li>acks the job, finds all sleeping workers related to the job. </li></ul><ul><li>sends them all a 'noop' command to wake them up. </li></ul>Gearman: an open source high performance job server (4)
  17. 17. <ul><li>Use cases </li></ul><ul><ul><li>Long running processes: thumbnail generation, image resizing, order processing in e-commerce … </li></ul></ul><ul><ul><li>High CPU or memory requirements: high volume data processing, MapReduce, log aggregation, video encoding </li></ul></ul><ul><ul><li>Distributed and parallel processing </li></ul></ul><ul><ul><li>Timing processing: incremental updates, data replication </li></ul></ul><ul><ul><li>Limited rate FIFO processing </li></ul></ul><ul><ul><li>Separation of concerns or security issues. </li></ul></ul><ul><ul><li>Priority-aware system monitoring tasks: WonderProxy </li></ul></ul>Gearman: an open source high performance job server (5)
  18. 18. <ul><li>PHP interface library to Gearman server </li></ul><ul><li>PECL gearman: http://pecl.php.net/package/gearman or https://github.com/php/pecl-gearman </li></ul><ul><li>Pear's Net_Gearman: http://pear.php.net/package/Net_Gearman </li></ul>Develop PHP clients and workers (1)
  19. 19. PHP Client = Job Sender Develop PHP clients and workers (2)
  20. 20. PHP Worker = Job Executor Develop PHP clients and workers (3)
  21. 21. <ul><li>Ease of use </li></ul><ul><li>How to manage multiple worker processes for a single job: launch, reload, stop, add process ... </li></ul><ul><li>Monitoring </li></ul><ul><li>Centralized management over set of servers </li></ul><ul><li>Web API (Restful) </li></ul>Challenges in managing workers – a case study of Gearman Agent Manager
  22. 22. Questions? @skunkworksvn, @pcdinh #barcampsaigon #teamskunkworks

×