Your SlideShare is downloading. ×
0
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Os Whitaker
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Os Whitaker

1,041

Published on

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,041
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Keeping Your Workers In Line: <ul><li>Brad Whitaker </li></ul><ul><li>Lisa Phillips </li></ul>use TheSchwartz;
    • 2. Once upon a time... In a galaxy far far away Users wanted features Like subscription based notifications, Pinging external services, And other things which Love to tie up webserver processes And are generally too slow / blocking to execute synchronously with web requests
    • 3. (Yes, that was really my best slide-foo, so bear with me)
    • 4. So we needed to solve the problems these features create
    • 5. Which can easily result in a mess
    • 6. Contacting External Web Services <ul><li>In LJ's case: </li></ul><ul><ul><li>weblogs.com </li></ul></ul><ul><ul><li>updates.sixapart.com </li></ul></ul><ul><ul><li>event notificatoins to Mother Russia </li></ul></ul><ul><ul><ul><li>seriously </li></ul></ul></ul><ul><li>Services need to be contacted whenever an entry / comment / asset is created </li></ul>
    • 7. Processing Uploaded Media <ul><li>Pictures need to be scaled </li></ul><ul><li>Videos need to be transcoded </li></ul><ul><li>Shouldn’t be done on the webserver </li></ul><ul><ul><li>-too slow </li></ul></ul><ul><ul><li>-hogs CPU/Memory resources </li></ul></ul><ul><ul><li>requires unnecessary libraries loaded in to Apache </li></ul></ul>
    • 8. Initial Solution: GhettoQueue <ul><li>Some sort of buffer on disk/database </li></ul><ul><ul><li>Or worse, a queue which gets blocked when a single job repeatedly fails </li></ul></ul><ul><li>Cron or daemon to process the queue </li></ul><ul><li>Gets really behind, pretty flaky, generally annoying </li></ul><ul><ul><li>Think 'qbufferd' in LJ </li></ul></ul><ul><ul><ul><li>Which was the bane of our existence </li></ul></ul></ul><ul><ul><ul><ul><li>For years. </li></ul></ul></ul></ul><ul><li>Hard to administer! </li></ul>
    • 9. Incoming data from lots of different transports <ul><li>Incoming emails </li></ul><ul><li>Incoming SMS and outbound response </li></ul><ul><li>Audio from Asterisk </li></ul><ul><li>... Just to name a few </li></ul>
    • 10. Initial Solution: Lots of Daemons! <ul><li>Lots of daemons! </li></ul><ul><li>In LJ: </li></ul><ul><ul><li>phonepostd </li></ul></ul><ul><ul><li>mailgated </li></ul></ul><ul><li>Mostly consist of biolerplate code to read / manage a spool directory, daemonize, handle locking </li></ul><ul><ul><li>Very little code dedicated to processing the actual data </li></ul></ul><ul><li>Hard to administer! </li></ul>
    • 11. Events, Subscriptions, Notifications? <ul><li>1 event => many subscribers => many notifications </li></ul><ul><li>Much too slow to find subscribers and issue notifications synchronously </li></ul><ul><li>Existing GhettoQueue mechanism sucks, so that's not an option. </li></ul><ul><li>... </li></ul><ul><li>Needs a real solution to reliable job processing </li></ul>
    • 12. Problems With These Approaches <ul><li>Everything is different for each service: </li></ul><ul><ul><li>Monitoring </li></ul></ul><ul><ul><li>Tools for Administration </li></ul></ul><ul><ul><li>Troubleshooting </li></ul></ul><ul><li>Operations people end up hating new features </li></ul><ul><ul><li>Each one brings a new set of headaches </li></ul></ul><ul><li>Fine for a while, but at some point becomes ridiculous </li></ul>
    • 13.  
    • 14. Implementation <ul><li>Perl + MySQL / SQLite </li></ul><ul><ul><li>SQLite mostly for test suite </li></ul></ul><ul><li>Python/Ruby/Etc people: Don't worry! Plans are underway to make TheSchwartz language agnostic </li></ul>
    • 15. Topology <ul><li>One or more databases </li></ul><ul><li>Worker machines </li></ul><ul><ul><li>Each running many worker processes </li></ul></ul>
    • 16. Schwartz Database <ul><li>Keeps track of: </li></ul><ul><ul><li>Jobs and their args </li></ul></ul><ul><ul><li>Errors </li></ul></ul><ul><ul><li>Exit status </li></ul></ul><ul><ul><li>...That's it! </li></ul></ul><ul><li>Small schema, can mostly stay in memory </li></ul><ul><li>Scaleable: Inserts are random to any database </li></ul>
    • 17. Schwartz Workers <ul><li>TheSchwartz::Worker subclasses </li></ul><ul><li>Know how to handle one or more job types </li></ul><ul><li>Accept TheSchwartz::Job as single parameter </li></ul><ul><li>... That's it! </li></ul>
    • 18. Full Topology: With Application
    • 19. Request Cycle <ul><li>1) Event to application </li></ul><ul><li>2) Application registers Schwartz job </li></ul><ul><li>3) Worker grabs Schwartz job </li></ul><ul><li>4) Worker does work, (usually modifying application data) </li></ul><ul><li>5) Worker (optionally) stores result in Schwartz database </li></ul>
    • 20. Let's look at some code...
    • 21. Application code
    • 22. Worker code
    • 23. It's really that simple...
    • 24. But it doesn't have to be
    • 25. Workers can define other per-worker behaviors <ul><li>Retries: </li></ul><ul><ul><li>sub max_retries { 5 } # 5 tries </li></ul></ul><ul><ul><li>sub retry_delay { </li></ul></ul><ul><ul><li>my ($class, $fail_ct) = @_; </li></ul></ul><ul><ul><li>return 2 ** $fail_ct; </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><li>Max time a process can work on a job: </li></ul><ul><ul><li>sub grab_for { 300 } # seconds </li></ul></ul><ul><li>Keeping exit status: </li></ul><ul><ul><li>sub keep_exit_status_for { 86400 } # 1 day </li></ul></ul>
    • 26. Other fun features <ul><li>Coalescing based on prefix </li></ul><ul><ul><li>Coalescing field explicitly stated when job is inserted </li></ul></ul><ul><ul><li>“ Give me all jobs that are sending email to Yahoo” </li></ul></ul><ul><li>Atomic job replacement </li></ul><ul><ul><li>“ For splitting one job up into many, which other workers can immdiately start working on” </li></ul></ul><ul><li>Scheduling future jobs </li></ul><ul><ul><li>Because we hate cron </li></ul></ul>
    • 27. Using TheSchwartz in production <ul><li>Livejournal currently handling over 100 jobs per second </li></ul>
    • 28. Schwartz Database Configuration <ul><li>Innodb </li></ul><ul><li>Master-master replication </li></ul><ul><li>One side active </li></ul><ul><li>Linux Heartbeat for shared VIP </li></ul><ul><li>Automatic binlog purging </li></ul>
    • 29. Schwartz Database Configuration … .. And so on, adding clusters as needed
    • 30. Tools and monitoring <ul><li>Schwartzmon </li></ul><ul><li>Schwartz-rate </li></ul><ul><li>LJWorkerctrl </li></ul><ul><li>Nagios plugins for queues </li></ul><ul><li>Triggers </li></ul>
    • 31. Schwartzmon Example lj@ljadmin1:~$ schwartzmon --dsn=DBI:mysql:theschwartz_livejournal;host=10.191.90.101 --user=lj -f errors Thu Jul 26 19:22:13 2007 [2116902910]: Connection failed to domain 'imagemenagerie.com', MXes: [imagemenagerie.com] Thu Jul 26 19:22:14 2007 [2120335058]: Connection failed to domain 'thedashcat.net', MXes: [thedashcat.net] Thu Jul 26 19:22:15 2007 [2126277932]: Connection failed to domain 'cox.net', MXes: [mx.west.cox.net mx.east.cox.net] Thu Jul 26 19:22:16 2007 [2126007758]: Connection failed to domain 'cox.net', MXes: [mx.west.cox.net mx.east.cox.net] Thu Jul 26 19:22:16 2007 [2126309296]: Permanent failure TO [hey2a@cs.com]: 550 MAILBOX NOT FOUND Thu Jul 26 19:22:17 2007 [2126309446]: Permanent failure TO [joe_junkpan@hotmail.com]: 550 Requested action not taken: mailbox unavailable Thu Jul 26 19:22:18 2007 [2126308836]: Error during DATAEND phase to [ourxtrees@yahoo.com]: 451 go ahead Message temporarily deferred - [250] Thu Jul 26 19:22:18 2007 [2126188996]: Connection failed to domain 'buffyboarders.zzn.com', MXes: [c2mailmx.mailcentro.com c2mds.mailcentro.com] Thu Jul 26 19:22:21 2007 [2125661508]: Connection failed to domain 'cox.net', MXes: [mx.east.cox.net mx.west.cox.net]
    • 32. Ljworkerctrl example
    • 33. Ljworkerctrl example <ul><li>… </li></ul>
    • 34. Ljworkerctrl example <ul><li>… . </li></ul>-
    • 35. Questions? http://code.sixapart.com/svn/TheSchwartz/trunk

    ×