To Batch Or Not To Batch


Published on

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.

Did this at 2011

Published in: Technology

To Batch Or Not To Batch

  1. 1. To Batch Or Not To Batch Luca Mearelli 2011
  2. 2. First and foremost, we believe that speedis more than a feature. Speed is the mostimportant feature. If your application is application isslow, people won’t use it. people won’t use it.Fred Wilson @lmea #rubyday
  3. 3. Not all the interesting features are fast Interacting with remote API Sending emails Media transcoding Large dataset handling @lmea #rubyday
  4. 4. Anatomy of an asynchronous action The app decides it needs to do a long operation The app asks the async system to do the operation and quickly returns the response The async system executes the operation out- of-band @lmea #rubyday
  5. 5. BatchAsynchronous jobsQueues & workers @lmea #rubyday
  6. 6. Batch @lmea #rubyday
  7. 7. Cron scheduled operations unrelated to the requests low frequency longer run time @lmea #rubyday
  8. 8. Anatomy of a cron batch: the rake task namespace :export do task :items_xml => :environment do # read the env variables # make the export end end @lmea #rubyday
  9. 9. Anatomy of a cron batch: the shell script #!/bin/sh # this goes in script/ cd /usr/rails/MyApp/current export RAILS_ENV=production echo "Item Export Full started: `date`" rake export:items_xml XML_FOLDER=data/exports echo "Item Export Full completed: `date`" @lmea #rubyday
  10. 10. Anatomy of a cron batch: the crontab entry 0 0 1 * * /usr/rails/MyApp/current/script/ >> /usr/rails/ MyApp/current/log/dump_item_export.log 2>&1 30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/ runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/ newsletter_daily.log 2>&1 @lmea #rubyday
  11. 11. Cron helpers Whenever Craken @lmea #rubyday
  12. 12. Whenever: schedule.rb # adds ">> /path/to/file.log 2>&1" to all commands set :output, /path/to/file.log every 3.hours do rake "my:rake:task" end every, :at => 4:30 am do runner "MyModel.task_to_run_at_four_thirty_in_the_morning" end every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => error.log, :standard => cron.log} end @lmea #rubyday
  13. 13. Cracken: raketab 59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1 @daily solr:reindex > /tmp/solr_daily.log 2>&1 # also @yearly, @annually, @monthly, @weekly, @midnight, @hourly @lmea #rubyday
  14. 14. Cracken: raketab.rb do |cron| cron.schedule thing:to_do > /tmp/thing_to_do.log 2>&1, :every => mon..fri cron.schedule first:five:days > /tmp/thing_to_do.log 2>&1, :days => [1,2,3,4,5] cron.schedule first:day:q1 > /tmp/thing_to_do.log 2>&1, :the => 1st, :in => [jan,feb,mar] cron.schedule first:day:q4 > /tmp/thing_to_do.log 2>&1, :the => 1st, :months => October,November,December end @lmea #rubyday
  15. 15. Queues & Workers un-scheduled operations responding to a request mid to high frequency mixed run time @lmea #rubyday
  16. 16. Queues & Workers Delayed job Resque @lmea #rubyday
  17. 17. Delayed job Any object method can be a job Db backed queue Integer-based priority Lifecycle hooks (enqueue, before, after, ... ) @lmea #rubyday
  18. 18. Delayed job: simple jobs # without delayed_job @user.notify!(@event) # with delayed_job @user.delay.notify!(@event) # always asyncronous method class Newsletter def deliver # long running method end handle_asynchronously :deliver end newsletter = newsletter.deliver @lmea #rubyday
  19. 19. Delayed job: handle_asyncronously handle_asynchronously :sync_method, :priority => 20 handle_asynchronously :in_the_future, :run_at => { 5.minutes.from_now } handle_asynchronously :call_a_class_method, :run_at => { when_to_run } handle_asynchronously :call_an_instance_method, :priority => {|i| i.how_important } @lmea #rubyday
  20. 20. Delayed job class NewsletterJob <, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } end end Delayed::Job.enqueue ipsum..., User.find(:all).collect(&:email)) @lmea #rubyday
  21. 21. Delayed job RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start RAILS_ENV=production script/delayed_job stop rake jobs:work @lmea #rubyday
  22. 22. Delayed job: checking the job status The queue is for scheduled and running jobs Handle the status outside Delayed::Job object @lmea #rubyday
  23. 23. Delayed job: checking the job status # Include this in your initializers somewhere class Queue < Delayed::Job def self.status(id) self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure") end end # Use this method in your poll method like so: def poll status = Queue.status(params[:id]) if status == "success" # Success, notify the user! elsif status == "failure" # Failure, notify the user! end end @lmea #rubyday
  24. 24. Delayed job: checking the job status class AJob < def perform do_something(options) end def success(job) # record success of Rails.cache.write("status:#{}", "success") end end # a helper def job_completed_with_success(job_id)"status:#{job_id}")=="success" end @lmea #rubyday
  25. 25. Resque Redis-backed queues Queue/dequeue speed independent of list size Forking behaviour Built in front-end Multiple queues / no priorities @lmea #rubyday
  26. 26. Resque: the job class Export @queue = :export_jobs def self.perform(dataset_id, kind = full) ds = Dataset.find(dataset_id) ds.create_export(kind) end end @lmea #rubyday
  27. 27. Resque: enqueuing the job class Dataset def async_create_export(kind) Resque.enqueue(Export,, kind) end end ds = Dataset.find(100) ds.async_create_export(full) @lmea #rubyday
  28. 28. Resque: persisting the job # jobs are persisted as JSON, # so jobs should only take arguments that can be expressed as JSON { class: Export, args: [ 100, full ] } # dont do this: Resque.enqueue(Export, self, kind) # do this: Resque.enqueue(Export,, kind) @lmea #rubyday
  29. 29. Resque: generic async methods # A simple async helper class Repository < ActiveRecord::Base # This will be called by a worker when a job needs to be processed def self.perform(id, method, *args) find(id).send(method, *args) end # We can pass this any Repository instance method that we want to # run later. def async(method, *args) Resque.enqueue(Repository, id, method, *args) end end # Now we can call any method and have it execute later: @repo.async(:update_disk_usage) @repo.async(:update_network_source_id, 34) @lmea #rubyday
  30. 30. Resque: anatomy of a worker # a worker does this: start loop do if job = reserve job.process else sleep 5 end end shutdown @lmea #rubyday
  31. 31. Resque: working the queues $ QUEUES=critical,high,low rake resque:work $ QUEUES=* rake resque:work $ PIDFILE=./ QUEUE=export_jobs rake environment resque:work task "resque:setup" => :environment do AppConfig.a_parameter = ... end @lmea #rubyday
  32. 32. Resque: monit recipe # example monit monitoring recipe check process resque_worker_batch_01 with pidfile /app/current/tmp/pids/ start program = "/bin/bash -c cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/" as uid deploy and gid deploy stop program = "/bin/bash -c cd /app/current && kill -s QUIT `cat tmp/pids/` && rm -f tmp/pids/; exit 0;" if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory? group resque_workers @lmea #rubyday
  33. 33. Resque: built-in monitoring @lmea #rubyday
  34. 34. Resque plugins Resque-status Resque-scheduler More at: @lmea #rubyday
  35. 35. Resque-status Simple trackable jobs for resque Job instances have a UUID Jobs can report their status while running @lmea #rubyday
  36. 36. Resque-status # inheriting from JobWithStatus class ExportJob < Resque::JobWithStatus # perform is an instance method def perform limit = options[limit].to_i || 1000 items = Item.limit(limit) total = items.count exported = [] items.each_with_index do |item, num| at(num, total, "At #{num} of #{total}") exported << item.to_csv end, w) { |f| f.write(exported.join("n")) } complete(:filename=>local_filename) end end @lmea #rubyday
  37. 37. Resque-status job_id = SleepJob.create(:length => 100) status = Resque::Status.get(job_id) # the status object tell us: status.pct_complete #=> 0 status.status #=> queued status.queued? #=> true status.working? #=> false status.time #=> Time object status.message #=> "Created at ..." Resque::Status.kill(job_id) @lmea #rubyday
  38. 38. Resque-scheduler Queueing for future execution Scheduling jobs (like cron!) @lmea #rubyday
  39. 39. Resque-scheduler # run a job in 5 days Resque.enqueue_in(5.days, SendFollowupEmail) # run SomeJob at a specific time Resque.enqueue_at(5.days.from_now, SomeJob) @lmea #rubyday
  40. 40. Resque-scheduler namespace :resque do task :setup do require resque require resque_scheduler require resque/scheduler Resque.redis = localhost:6379 # The schedule doesnt need to be stored in a YAML, it just needs to # be a hash. YAML is usually the easiest. Resque::Scheduler.schedule = YAML.load_file(your_resque_schedule.yml) # When dynamic is set to true, the scheduler process looks for # schedule changes and applies them on the fly. # Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule) # methods can be used to alter the schedule #Resque::Scheduler.dynamic = true end end $ rake resque:scheduler @lmea #rubyday
  41. 41. Resque-scheduler: the yaml configuration queue_documents_for_indexing: cron: "0 0 * * *" class: QueueDocuments queue: high args: description: "This job queues all content for indexing in solr" export_items: cron: "30 6 * * 1" class: Export queue: low args: full description: "This job does a weekly export" @lmea #rubyday
  42. 42. Other (commercial) SimpleWorker SQS @lmea #rubyday
  43. 43. Other (historical) Beanstalkd and Stalker Backgroundjob (Bj) BackgroundRb @lmea #rubyday
  44. 44. Other (different approaches) Nanite Cloud Crowd @lmea #rubyday
  45. 45. Ciao! @lmea #rubyday
  46. 46. @lmea #rubyday
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.