Your SlideShare is downloading. ×
0
To Batch Or Not To Batch         Luca Mearelli        rubyday.it 2011
First and foremost, we believe that speedis more than a feature. Speed is the mostimportant feature. If your application i...
Not all the interesting features are fast          Interacting with remote API          Sending emails          Media tran...
Anatomy of an asynchronous action        The app decides it needs to do a long operation        The app asks the async sys...
BatchAsynchronous jobsQueues & workers                    @lmea #rubyday
Batch        @lmea #rubyday
Cron       scheduled operations       unrelated to the requests       low frequency       longer run time                 ...
Anatomy of a cron batch: the rake task  namespace :export do    task :items_xml => :environment do      # read the env var...
Anatomy of a cron batch: the shell script  #!/bin/sh  # this goes in script/item_export_full.sh  cd /usr/rails/MyApp/curre...
Anatomy of a cron batch: the crontab entry  0 0 1 * *    /usr/rails/MyApp/current/script/item_export_full.sh    >> /usr/ra...
Cron helpers  Whenever    https://github.com/javan/whenever  Craken    https://github.com/latimes/craken                  ...
Whenever: schedule.rb  # adds ">> /path/to/file.log 2>&1" to all commands  set :output, /path/to/file.log  every 3.hours d...
Cracken: raketab  59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1  @daily solr:reindex > /tmp/solr_daily.log 2>&1  # al...
Cracken: raketab.rb  Raketab.new do |cron|    cron.schedule thing:to_do > /tmp/thing_to_do.log 2>&1,                  :eve...
Queues & Workers        un-scheduled operations        responding to a request        mid to high frequency        mixed r...
Queues & Workers Delayed job    https://github.com/collectiveidea/delayed_job Resque    https://github.com/defunkt/resque ...
Delayed job         Any object method can be a job         Db backed queue         Integer-based priority         Lifecycl...
Delayed job: simple jobs  # without delayed_job  @user.notify!(@event)  # with delayed_job  @user.delay.notify!(@event)  #...
Delayed job: handle_asyncronously  handle_asynchronously :sync_method,                        :priority => 20  handle_asyn...
Delayed job  class NewsletterJob < Struct.new(:text, :emails)    def perform      emails.each { |e| NewsMailer.deliver_tex...
Delayed job  RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start  RAILS_ENV=production script/delayed_job...
Delayed job: checking the job status         The queue is for scheduled and running jobs         Handle the status outside...
Delayed job: checking the job status  # Include this in your initializers somewhere  class Queue < Delayed::Job    def sel...
Delayed job: checking the job status  class AJob < Struct.new(:options)    def perform      do_something(options)    end  ...
Resque         Redis-backed queues         Queue/dequeue speed independent of list size         Forking behaviour         ...
Resque: the job  class Export    @queue = :export_jobs    def self.perform(dataset_id, kind = full)      ds = Dataset.find...
Resque: enqueuing the job  class Dataset    def async_create_export(kind)      Resque.enqueue(Export, self.id, kind)    en...
Resque: persisting the job  # jobs are persisted as JSON,  # so jobs should only take arguments that can be expressed as J...
Resque: generic async methods  # A simple async helper  class Repository < ActiveRecord::Base    # This will be called by ...
Resque: anatomy of a worker  # a worker does this:  start  loop do    if job = reserve      job.process    else      sleep...
Resque: working the queues  $ QUEUES=critical,high,low rake resque:work  $ QUEUES=* rake resque:work  $ PIDFILE=./resque.p...
Resque: monit recipe  # example monit monitoring recipe  check process resque_worker_batch_01    with pidfile /app/current...
Resque: built-in monitoring                              @lmea #rubyday
Resque plugins  Resque-status    https://github.com/quirkey/resque-status  Resque-scheduler    https://github.com/bvandenb...
Resque-status        Simple trackable jobs for resque        Job instances have a UUID        Jobs can report their status...
Resque-status  # inheriting from JobWithStatus  class ExportJob < Resque::JobWithStatus    # perform is an instance method...
Resque-status  job_id = SleepJob.create(:length => 100)  status = Resque::Status.get(job_id)  # the status object tell us:...
Resque-scheduler        Queueing for future execution        Scheduling jobs (like cron!)                                 ...
Resque-scheduler  # run a job in 5 days  Resque.enqueue_in(5.days, SendFollowupEmail)  # run SomeJob at a specific time  R...
Resque-scheduler  namespace :resque do    task :setup do      require resque      require resque_scheduler      require re...
Resque-scheduler: the yaml configuration  queue_documents_for_indexing:    cron: "0 0 * * *"    class: QueueDocuments    qu...
Other (commercial)  SimpleWorker     http://simpleworker.com  SQS    https://github.com/appoxy/aws/    http://rubygems.org...
Other (historical)  Beanstalkd and Stalker     http://asciicasts.com/episodes/243-beanstalkd-and-stalker     http://kr.git...
Other (different approaches)  Nanite    http://www.slideshare.net/jendavis100/background-processing-with-nanite  Cloud Crow...
Ciao!   me@spazidigitali.com             @lmea #rubyday
http://www.flickr.com/photos/rkbcupcakes/3373909785/http://www.flickr.com/photos/anjin/23460398http://www.flickr.com/photos/v...
Upcoming SlideShare
Loading in...5
×

To Batch Or Not To Batch

3,165

Published on

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.

Did this at rubyday.it 2011

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,165
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
27
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "To Batch Or Not To Batch"

  1. 1. To Batch Or Not To Batch Luca Mearelli rubyday.it 2011
  2. 2. First and foremost, we believe that speedis more than a feature. Speed is the mostimportant feature. If your application is application isslow, people won’t use it. people won’t use it.Fred Wilson @lmea #rubyday
  3. 3. Not all the interesting features are fast Interacting with remote API Sending emails Media transcoding Large dataset handling @lmea #rubyday
  4. 4. Anatomy of an asynchronous action The app decides it needs to do a long operation The app asks the async system to do the operation and quickly returns the response The async system executes the operation out- of-band @lmea #rubyday
  5. 5. BatchAsynchronous jobsQueues & workers @lmea #rubyday
  6. 6. Batch @lmea #rubyday
  7. 7. Cron scheduled operations unrelated to the requests low frequency longer run time @lmea #rubyday
  8. 8. Anatomy of a cron batch: the rake task namespace :export do task :items_xml => :environment do # read the env variables # make the export end end @lmea #rubyday
  9. 9. Anatomy of a cron batch: the shell script #!/bin/sh # this goes in script/item_export_full.sh cd /usr/rails/MyApp/current export RAILS_ENV=production echo "Item Export Full started: `date`" rake export:items_xml XML_FOLDER=data/exports echo "Item Export Full completed: `date`" @lmea #rubyday
  10. 10. Anatomy of a cron batch: the crontab entry 0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/ MyApp/current/log/dump_item_export.log 2>&1 30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/ runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/ newsletter_daily.log 2>&1 @lmea #rubyday
  11. 11. Cron helpers Whenever https://github.com/javan/whenever Craken https://github.com/latimes/craken @lmea #rubyday
  12. 12. Whenever: schedule.rb # adds ">> /path/to/file.log 2>&1" to all commands set :output, /path/to/file.log every 3.hours do rake "my:rake:task" end every 1.day, :at => 4:30 am do runner "MyModel.task_to_run_at_four_thirty_in_the_morning" end every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => error.log, :standard => cron.log} end @lmea #rubyday
  13. 13. Cracken: raketab 59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1 @daily solr:reindex > /tmp/solr_daily.log 2>&1 # also @yearly, @annually, @monthly, @weekly, @midnight, @hourly @lmea #rubyday
  14. 14. Cracken: raketab.rb Raketab.new do |cron| cron.schedule thing:to_do > /tmp/thing_to_do.log 2>&1, :every => mon..fri cron.schedule first:five:days > /tmp/thing_to_do.log 2>&1, :days => [1,2,3,4,5] cron.schedule first:day:q1 > /tmp/thing_to_do.log 2>&1, :the => 1st, :in => [jan,feb,mar] cron.schedule first:day:q4 > /tmp/thing_to_do.log 2>&1, :the => 1st, :months => October,November,December end @lmea #rubyday
  15. 15. Queues & Workers un-scheduled operations responding to a request mid to high frequency mixed run time @lmea #rubyday
  16. 16. Queues & Workers Delayed job https://github.com/collectiveidea/delayed_job Resque https://github.com/defunkt/resque @lmea #rubyday
  17. 17. Delayed job Any object method can be a job Db backed queue Integer-based priority Lifecycle hooks (enqueue, before, after, ... ) @lmea #rubyday
  18. 18. Delayed job: simple jobs # without delayed_job @user.notify!(@event) # with delayed_job @user.delay.notify!(@event) # always asyncronous method class Newsletter def deliver # long running method end handle_asynchronously :deliver end newsletter = Newsletter.new newsletter.deliver @lmea #rubyday
  19. 19. Delayed job: handle_asyncronously handle_asynchronously :sync_method, :priority => 20 handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now } handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run } handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important } @lmea #rubyday
  20. 20. Delayed job class NewsletterJob < Struct.new(:text, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } end end Delayed::Job.enqueue NewsletterJob.new(lorem ipsum..., User.find(:all).collect(&:email)) @lmea #rubyday
  21. 21. Delayed job RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start RAILS_ENV=production script/delayed_job stop rake jobs:work @lmea #rubyday
  22. 22. Delayed job: checking the job status The queue is for scheduled and running jobs Handle the status outside Delayed::Job object @lmea #rubyday
  23. 23. Delayed job: checking the job status # Include this in your initializers somewhere class Queue < Delayed::Job def self.status(id) self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure") end end # Use this method in your poll method like so: def poll status = Queue.status(params[:id]) if status == "success" # Success, notify the user! elsif status == "failure" # Failure, notify the user! end end @lmea #rubyday
  24. 24. Delayed job: checking the job status class AJob < Struct.new(:options) def perform do_something(options) end def success(job) # record success of job.id Rails.cache.write("status:#{job.id}", "success") end end # a helper def job_completed_with_success(job_id) Rails.cache.read("status:#{job_id}")=="success" end @lmea #rubyday
  25. 25. Resque Redis-backed queues Queue/dequeue speed independent of list size Forking behaviour Built in front-end Multiple queues / no priorities @lmea #rubyday
  26. 26. Resque: the job class Export @queue = :export_jobs def self.perform(dataset_id, kind = full) ds = Dataset.find(dataset_id) ds.create_export(kind) end end @lmea #rubyday
  27. 27. Resque: enqueuing the job class Dataset def async_create_export(kind) Resque.enqueue(Export, self.id, kind) end end ds = Dataset.find(100) ds.async_create_export(full) @lmea #rubyday
  28. 28. Resque: persisting the job # jobs are persisted as JSON, # so jobs should only take arguments that can be expressed as JSON { class: Export, args: [ 100, full ] } # dont do this: Resque.enqueue(Export, self, kind) # do this: Resque.enqueue(Export, self.id, kind) @lmea #rubyday
  29. 29. Resque: generic async methods # A simple async helper class Repository < ActiveRecord::Base # This will be called by a worker when a job needs to be processed def self.perform(id, method, *args) find(id).send(method, *args) end # We can pass this any Repository instance method that we want to # run later. def async(method, *args) Resque.enqueue(Repository, id, method, *args) end end # Now we can call any method and have it execute later: @repo.async(:update_disk_usage) @repo.async(:update_network_source_id, 34) @lmea #rubyday
  30. 30. Resque: anatomy of a worker # a worker does this: start loop do if job = reserve job.process else sleep 5 end end shutdown @lmea #rubyday
  31. 31. Resque: working the queues $ QUEUES=critical,high,low rake resque:work $ QUEUES=* rake resque:work $ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work task "resque:setup" => :environment do AppConfig.a_parameter = ... end @lmea #rubyday
  32. 32. Resque: monit recipe # example monit monitoring recipe check process resque_worker_batch_01 with pidfile /app/current/tmp/pids/worker_01.pid start program = "/bin/bash -c cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid" as uid deploy and gid deploy stop program = "/bin/bash -c cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm -f tmp/pids/worker_01.pid; exit 0;" if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory? group resque_workers @lmea #rubyday
  33. 33. Resque: built-in monitoring @lmea #rubyday
  34. 34. Resque plugins Resque-status https://github.com/quirkey/resque-status Resque-scheduler https://github.com/bvandenbos/resque-scheduler/ More at: https://github.com/defunkt/resque/wiki/plugins @lmea #rubyday
  35. 35. Resque-status Simple trackable jobs for resque Job instances have a UUID Jobs can report their status while running @lmea #rubyday
  36. 36. Resque-status # inheriting from JobWithStatus class ExportJob < Resque::JobWithStatus # perform is an instance method def perform limit = options[limit].to_i || 1000 items = Item.limit(limit) total = items.count exported = [] items.each_with_index do |item, num| at(num, total, "At #{num} of #{total}") exported << item.to_csv end File.open(local_filename, w) { |f| f.write(exported.join("n")) } complete(:filename=>local_filename) end end @lmea #rubyday
  37. 37. Resque-status job_id = SleepJob.create(:length => 100) status = Resque::Status.get(job_id) # the status object tell us: status.pct_complete #=> 0 status.status #=> queued status.queued? #=> true status.working? #=> false status.time #=> Time object status.message #=> "Created at ..." Resque::Status.kill(job_id) @lmea #rubyday
  38. 38. Resque-scheduler Queueing for future execution Scheduling jobs (like cron!) @lmea #rubyday
  39. 39. Resque-scheduler # run a job in 5 days Resque.enqueue_in(5.days, SendFollowupEmail) # run SomeJob at a specific time Resque.enqueue_at(5.days.from_now, SomeJob) @lmea #rubyday
  40. 40. Resque-scheduler namespace :resque do task :setup do require resque require resque_scheduler require resque/scheduler Resque.redis = localhost:6379 # The schedule doesnt need to be stored in a YAML, it just needs to # be a hash. YAML is usually the easiest. Resque::Scheduler.schedule = YAML.load_file(your_resque_schedule.yml) # When dynamic is set to true, the scheduler process looks for # schedule changes and applies them on the fly. # Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule) # methods can be used to alter the schedule #Resque::Scheduler.dynamic = true end end $ rake resque:scheduler @lmea #rubyday
  41. 41. Resque-scheduler: the yaml configuration queue_documents_for_indexing: cron: "0 0 * * *" class: QueueDocuments queue: high args: description: "This job queues all content for indexing in solr" export_items: cron: "30 6 * * 1" class: Export queue: low args: full description: "This job does a weekly export" @lmea #rubyday
  42. 42. Other (commercial) SimpleWorker http://simpleworker.com SQS https://github.com/appoxy/aws/ http://rubygems.org/gems/right_aws http://sdruby.org/video/024_amazon_sqs.m4v @lmea #rubyday
  43. 43. Other (historical) Beanstalkd and Stalker http://asciicasts.com/episodes/243-beanstalkd-and-stalker http://kr.github.com/beanstalkd/ https://github.com/han/stalker Backgroundjob (Bj) https://github.com/ahoward/bj BackgroundRb http://backgroundrb.rubyforge.org/ @lmea #rubyday
  44. 44. Other (different approaches) Nanite http://www.slideshare.net/jendavis100/background-processing-with-nanite Cloud Crowd https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started @lmea #rubyday
  45. 45. Ciao! me@spazidigitali.com @lmea #rubyday
  46. 46. http://www.flickr.com/photos/rkbcupcakes/3373909785/http://www.flickr.com/photos/anjin/23460398http://www.flickr.com/photos/vivacomopuder/3122401239http://www.flickr.com/photos/pacdog/4968422200http://www.flickr.com/photos/comedynose/3834416952http://www.flickr.com/photos/rhysasplundh/5177851910/http://www.flickr.com/photos/marypcb/104308457http://www.flickr.com/photos/shutterhacks/4474421855http://www.flickr.com/photos/kevinschoenmakersnl/5562839479http://www.flickr.com/photos/triplexpresso/496995086http://www.flickr.com/photos/saxonmoseley/24523450http://www.flickr.com/photos/gadl/89650415http://www.flickr.com/photos/matvey_andreyev/3656451273http://www.flickr.com/photos/bryankennedy/1992770068http://www.flickr.com/photos/27282406@N03/4134661728/ @lmea #rubyday
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×