Your SlideShare is downloading. ×

Os Whitaker

1,028

Published on

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,028
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Keeping Your Workers In Line:
      • Brad Whitaker
      • Lisa Phillips
      use TheSchwartz;
    • 2. Once upon a time... In a galaxy far far away Users wanted features Like subscription based notifications, Pinging external services, And other things which Love to tie up webserver processes And are generally too slow / blocking to execute synchronously with web requests
    • 3. (Yes, that was really my best slide-foo, so bear with me)
    • 4. So we needed to solve the problems these features create
    • 5. Which can easily result in a mess
    • 6. Contacting External Web Services
      • In LJ's case:
        • weblogs.com
        • updates.sixapart.com
        • event notificatoins to Mother Russia
          • seriously
      • Services need to be contacted whenever an entry / comment / asset is created
    • 7. Processing Uploaded Media
      • Pictures need to be scaled
      • Videos need to be transcoded
      • Shouldn’t be done on the webserver
        • -too slow
        • -hogs CPU/Memory resources
        • requires unnecessary libraries loaded in to Apache
    • 8. Initial Solution: GhettoQueue
      • Some sort of buffer on disk/database
        • Or worse, a queue which gets blocked when a single job repeatedly fails
      • Cron or daemon to process the queue
      • Gets really behind, pretty flaky, generally annoying
        • Think 'qbufferd' in LJ
          • Which was the bane of our existence
            • For years.
      • Hard to administer!
    • 9. Incoming data from lots of different transports
      • Incoming emails
      • Incoming SMS and outbound response
      • Audio from Asterisk
      • ... Just to name a few
    • 10. Initial Solution: Lots of Daemons!
      • Lots of daemons!
      • In LJ:
        • phonepostd
        • mailgated
      • Mostly consist of biolerplate code to read / manage a spool directory, daemonize, handle locking
        • Very little code dedicated to processing the actual data
      • Hard to administer!
    • 11. Events, Subscriptions, Notifications?
      • 1 event => many subscribers => many notifications
      • Much too slow to find subscribers and issue notifications synchronously
      • Existing GhettoQueue mechanism sucks, so that's not an option.
      • ...
      • Needs a real solution to reliable job processing
    • 12. Problems With These Approaches
      • Everything is different for each service:
        • Monitoring
        • Tools for Administration
        • Troubleshooting
      • Operations people end up hating new features
        • Each one brings a new set of headaches
      • Fine for a while, but at some point becomes ridiculous
    • 13.  
    • 14. Implementation
      • Perl + MySQL / SQLite
        • SQLite mostly for test suite
      • Python/Ruby/Etc people: Don't worry! Plans are underway to make TheSchwartz language agnostic
    • 15. Topology
      • One or more databases
      • Worker machines
        • Each running many worker processes
    • 16. Schwartz Database
      • Keeps track of:
        • Jobs and their args
        • Errors
        • Exit status
        • ...That's it!
      • Small schema, can mostly stay in memory
      • Scaleable: Inserts are random to any database
    • 17. Schwartz Workers
      • TheSchwartz::Worker subclasses
      • Know how to handle one or more job types
      • Accept TheSchwartz::Job as single parameter
      • ... That's it!
    • 18. Full Topology: With Application
    • 19. Request Cycle
      • 1) Event to application
      • 2) Application registers Schwartz job
      • 3) Worker grabs Schwartz job
      • 4) Worker does work, (usually modifying application data)
      • 5) Worker (optionally) stores result in Schwartz database
    • 20. Let's look at some code...
    • 21. Application code
    • 22. Worker code
    • 23. It's really that simple...
    • 24. But it doesn't have to be
    • 25. Workers can define other per-worker behaviors
      • Retries:
        • sub max_retries { 5 } # 5 tries
        • sub retry_delay {
        • my ($class, $fail_ct) = @_;
        • return 2 ** $fail_ct;
        • }
      • Max time a process can work on a job:
        • sub grab_for { 300 } # seconds
      • Keeping exit status:
        • sub keep_exit_status_for { 86400 } # 1 day
    • 26. Other fun features
      • Coalescing based on prefix
        • Coalescing field explicitly stated when job is inserted
        • “ Give me all jobs that are sending email to Yahoo”
      • Atomic job replacement
        • “ For splitting one job up into many, which other workers can immdiately start working on”
      • Scheduling future jobs
        • Because we hate cron
    • 27. Using TheSchwartz in production
      • Livejournal currently handling over 100 jobs per second
    • 28. Schwartz Database Configuration
      • Innodb
      • Master-master replication
      • One side active
      • Linux Heartbeat for shared VIP
      • Automatic binlog purging
    • 29. Schwartz Database Configuration … .. And so on, adding clusters as needed
    • 30. Tools and monitoring
      • Schwartzmon
      • Schwartz-rate
      • LJWorkerctrl
      • Nagios plugins for queues
      • Triggers
    • 31. Schwartzmon Example lj@ljadmin1:~$ schwartzmon --dsn=DBI:mysql:theschwartz_livejournal;host=10.191.90.101 --user=lj -f errors Thu Jul 26 19:22:13 2007 [2116902910]: Connection failed to domain 'imagemenagerie.com', MXes: [imagemenagerie.com] Thu Jul 26 19:22:14 2007 [2120335058]: Connection failed to domain 'thedashcat.net', MXes: [thedashcat.net] Thu Jul 26 19:22:15 2007 [2126277932]: Connection failed to domain 'cox.net', MXes: [mx.west.cox.net mx.east.cox.net] Thu Jul 26 19:22:16 2007 [2126007758]: Connection failed to domain 'cox.net', MXes: [mx.west.cox.net mx.east.cox.net] Thu Jul 26 19:22:16 2007 [2126309296]: Permanent failure TO [hey2a@cs.com]: 550 MAILBOX NOT FOUND Thu Jul 26 19:22:17 2007 [2126309446]: Permanent failure TO [joe_junkpan@hotmail.com]: 550 Requested action not taken: mailbox unavailable Thu Jul 26 19:22:18 2007 [2126308836]: Error during DATAEND phase to [ourxtrees@yahoo.com]: 451 go ahead Message temporarily deferred - [250] Thu Jul 26 19:22:18 2007 [2126188996]: Connection failed to domain 'buffyboarders.zzn.com', MXes: [c2mailmx.mailcentro.com c2mds.mailcentro.com] Thu Jul 26 19:22:21 2007 [2125661508]: Connection failed to domain 'cox.net', MXes: [mx.east.cox.net mx.west.cox.net]
    • 32. Ljworkerctrl example
    • 33. Ljworkerctrl example
    • 34. Ljworkerctrl example
      • … .
      -
    • 35. Questions? http://code.sixapart.com/svn/TheSchwartz/trunk

    ×