Your SlideShare is downloading. ×
To Batch Or Not To Batch
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

To Batch Or Not To Batch


Published on

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application. …

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.

Did this at 2011

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. To Batch Or Not To Batch Luca Mearelli 2011
  • 2. First and foremost, we believe that speedis more than a feature. Speed is the mostimportant feature. If your application is application isslow, people won’t use it. people won’t use it.Fred Wilson @lmea #rubyday
  • 3. Not all the interesting features are fast Interacting with remote API Sending emails Media transcoding Large dataset handling @lmea #rubyday
  • 4. Anatomy of an asynchronous action The app decides it needs to do a long operation The app asks the async system to do the operation and quickly returns the response The async system executes the operation out- of-band @lmea #rubyday
  • 5. BatchAsynchronous jobsQueues & workers @lmea #rubyday
  • 6. Batch @lmea #rubyday
  • 7. Cron scheduled operations unrelated to the requests low frequency longer run time @lmea #rubyday
  • 8. Anatomy of a cron batch: the rake task namespace :export do task :items_xml => :environment do # read the env variables # make the export end end @lmea #rubyday
  • 9. Anatomy of a cron batch: the shell script #!/bin/sh # this goes in script/ cd /usr/rails/MyApp/current export RAILS_ENV=production echo "Item Export Full started: `date`" rake export:items_xml XML_FOLDER=data/exports echo "Item Export Full completed: `date`" @lmea #rubyday
  • 10. Anatomy of a cron batch: the crontab entry 0 0 1 * * /usr/rails/MyApp/current/script/ >> /usr/rails/ MyApp/current/log/dump_item_export.log 2>&1 30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/ runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/ newsletter_daily.log 2>&1 @lmea #rubyday
  • 11. Cron helpers Whenever Craken @lmea #rubyday
  • 12. Whenever: schedule.rb # adds ">> /path/to/file.log 2>&1" to all commands set :output, /path/to/file.log every 3.hours do rake "my:rake:task" end every, :at => 4:30 am do runner "MyModel.task_to_run_at_four_thirty_in_the_morning" end every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => error.log, :standard => cron.log} end @lmea #rubyday
  • 13. Cracken: raketab 59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1 @daily solr:reindex > /tmp/solr_daily.log 2>&1 # also @yearly, @annually, @monthly, @weekly, @midnight, @hourly @lmea #rubyday
  • 14. Cracken: raketab.rb do |cron| cron.schedule thing:to_do > /tmp/thing_to_do.log 2>&1, :every => mon..fri cron.schedule first:five:days > /tmp/thing_to_do.log 2>&1, :days => [1,2,3,4,5] cron.schedule first:day:q1 > /tmp/thing_to_do.log 2>&1, :the => 1st, :in => [jan,feb,mar] cron.schedule first:day:q4 > /tmp/thing_to_do.log 2>&1, :the => 1st, :months => October,November,December end @lmea #rubyday
  • 15. Queues & Workers un-scheduled operations responding to a request mid to high frequency mixed run time @lmea #rubyday
  • 16. Queues & Workers Delayed job Resque @lmea #rubyday
  • 17. Delayed job Any object method can be a job Db backed queue Integer-based priority Lifecycle hooks (enqueue, before, after, ... ) @lmea #rubyday
  • 18. Delayed job: simple jobs # without delayed_job @user.notify!(@event) # with delayed_job @user.delay.notify!(@event) # always asyncronous method class Newsletter def deliver # long running method end handle_asynchronously :deliver end newsletter = newsletter.deliver @lmea #rubyday
  • 19. Delayed job: handle_asyncronously handle_asynchronously :sync_method, :priority => 20 handle_asynchronously :in_the_future, :run_at => { 5.minutes.from_now } handle_asynchronously :call_a_class_method, :run_at => { when_to_run } handle_asynchronously :call_an_instance_method, :priority => {|i| i.how_important } @lmea #rubyday
  • 20. Delayed job class NewsletterJob <, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } end end Delayed::Job.enqueue ipsum..., User.find(:all).collect(&:email)) @lmea #rubyday
  • 21. Delayed job RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start RAILS_ENV=production script/delayed_job stop rake jobs:work @lmea #rubyday
  • 22. Delayed job: checking the job status The queue is for scheduled and running jobs Handle the status outside Delayed::Job object @lmea #rubyday
  • 23. Delayed job: checking the job status # Include this in your initializers somewhere class Queue < Delayed::Job def self.status(id) self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure") end end # Use this method in your poll method like so: def poll status = Queue.status(params[:id]) if status == "success" # Success, notify the user! elsif status == "failure" # Failure, notify the user! end end @lmea #rubyday
  • 24. Delayed job: checking the job status class AJob < def perform do_something(options) end def success(job) # record success of Rails.cache.write("status:#{}", "success") end end # a helper def job_completed_with_success(job_id)"status:#{job_id}")=="success" end @lmea #rubyday
  • 25. Resque Redis-backed queues Queue/dequeue speed independent of list size Forking behaviour Built in front-end Multiple queues / no priorities @lmea #rubyday
  • 26. Resque: the job class Export @queue = :export_jobs def self.perform(dataset_id, kind = full) ds = Dataset.find(dataset_id) ds.create_export(kind) end end @lmea #rubyday
  • 27. Resque: enqueuing the job class Dataset def async_create_export(kind) Resque.enqueue(Export,, kind) end end ds = Dataset.find(100) ds.async_create_export(full) @lmea #rubyday
  • 28. Resque: persisting the job # jobs are persisted as JSON, # so jobs should only take arguments that can be expressed as JSON { class: Export, args: [ 100, full ] } # dont do this: Resque.enqueue(Export, self, kind) # do this: Resque.enqueue(Export,, kind) @lmea #rubyday
  • 29. Resque: generic async methods # A simple async helper class Repository < ActiveRecord::Base # This will be called by a worker when a job needs to be processed def self.perform(id, method, *args) find(id).send(method, *args) end # We can pass this any Repository instance method that we want to # run later. def async(method, *args) Resque.enqueue(Repository, id, method, *args) end end # Now we can call any method and have it execute later: @repo.async(:update_disk_usage) @repo.async(:update_network_source_id, 34) @lmea #rubyday
  • 30. Resque: anatomy of a worker # a worker does this: start loop do if job = reserve job.process else sleep 5 end end shutdown @lmea #rubyday
  • 31. Resque: working the queues $ QUEUES=critical,high,low rake resque:work $ QUEUES=* rake resque:work $ PIDFILE=./ QUEUE=export_jobs rake environment resque:work task "resque:setup" => :environment do AppConfig.a_parameter = ... end @lmea #rubyday
  • 32. Resque: monit recipe # example monit monitoring recipe check process resque_worker_batch_01 with pidfile /app/current/tmp/pids/ start program = "/bin/bash -c cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/" as uid deploy and gid deploy stop program = "/bin/bash -c cd /app/current && kill -s QUIT `cat tmp/pids/` && rm -f tmp/pids/; exit 0;" if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory? group resque_workers @lmea #rubyday
  • 33. Resque: built-in monitoring @lmea #rubyday
  • 34. Resque plugins Resque-status Resque-scheduler More at: @lmea #rubyday
  • 35. Resque-status Simple trackable jobs for resque Job instances have a UUID Jobs can report their status while running @lmea #rubyday
  • 36. Resque-status # inheriting from JobWithStatus class ExportJob < Resque::JobWithStatus # perform is an instance method def perform limit = options[limit].to_i || 1000 items = Item.limit(limit) total = items.count exported = [] items.each_with_index do |item, num| at(num, total, "At #{num} of #{total}") exported << item.to_csv end, w) { |f| f.write(exported.join("n")) } complete(:filename=>local_filename) end end @lmea #rubyday
  • 37. Resque-status job_id = SleepJob.create(:length => 100) status = Resque::Status.get(job_id) # the status object tell us: status.pct_complete #=> 0 status.status #=> queued status.queued? #=> true status.working? #=> false status.time #=> Time object status.message #=> "Created at ..." Resque::Status.kill(job_id) @lmea #rubyday
  • 38. Resque-scheduler Queueing for future execution Scheduling jobs (like cron!) @lmea #rubyday
  • 39. Resque-scheduler # run a job in 5 days Resque.enqueue_in(5.days, SendFollowupEmail) # run SomeJob at a specific time Resque.enqueue_at(5.days.from_now, SomeJob) @lmea #rubyday
  • 40. Resque-scheduler namespace :resque do task :setup do require resque require resque_scheduler require resque/scheduler Resque.redis = localhost:6379 # The schedule doesnt need to be stored in a YAML, it just needs to # be a hash. YAML is usually the easiest. Resque::Scheduler.schedule = YAML.load_file(your_resque_schedule.yml) # When dynamic is set to true, the scheduler process looks for # schedule changes and applies them on the fly. # Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule) # methods can be used to alter the schedule #Resque::Scheduler.dynamic = true end end $ rake resque:scheduler @lmea #rubyday
  • 41. Resque-scheduler: the yaml configuration queue_documents_for_indexing: cron: "0 0 * * *" class: QueueDocuments queue: high args: description: "This job queues all content for indexing in solr" export_items: cron: "30 6 * * 1" class: Export queue: low args: full description: "This job does a weekly export" @lmea #rubyday
  • 42. Other (commercial) SimpleWorker SQS @lmea #rubyday
  • 43. Other (historical) Beanstalkd and Stalker Backgroundjob (Bj) BackgroundRb @lmea #rubyday
  • 44. Other (different approaches) Nanite Cloud Crowd @lmea #rubyday
  • 45. Ciao! @lmea #rubyday
  • 46. @lmea #rubyday