To Batch Or Not To Batch

  • 2,966 views
Uploaded on

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application. …

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.

Did this at rubyday.it 2011

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,966
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. To Batch Or Not To Batch Luca Mearelli rubyday.it 2011
  • 2. First and foremost, we believe that speedis more than a feature. Speed is the mostimportant feature. If your application is application isslow, people won’t use it. people won’t use it.Fred Wilson @lmea #rubyday
  • 3. Not all the interesting features are fast Interacting with remote API Sending emails Media transcoding Large dataset handling @lmea #rubyday
  • 4. Anatomy of an asynchronous action The app decides it needs to do a long operation The app asks the async system to do the operation and quickly returns the response The async system executes the operation out- of-band @lmea #rubyday
  • 5. BatchAsynchronous jobsQueues & workers @lmea #rubyday
  • 6. Batch @lmea #rubyday
  • 7. Cron scheduled operations unrelated to the requests low frequency longer run time @lmea #rubyday
  • 8. Anatomy of a cron batch: the rake task namespace :export do task :items_xml => :environment do # read the env variables # make the export end end @lmea #rubyday
  • 9. Anatomy of a cron batch: the shell script #!/bin/sh # this goes in script/item_export_full.sh cd /usr/rails/MyApp/current export RAILS_ENV=production echo "Item Export Full started: `date`" rake export:items_xml XML_FOLDER=data/exports echo "Item Export Full completed: `date`" @lmea #rubyday
  • 10. Anatomy of a cron batch: the crontab entry 0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/ MyApp/current/log/dump_item_export.log 2>&1 30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/ runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/ newsletter_daily.log 2>&1 @lmea #rubyday
  • 11. Cron helpers Whenever https://github.com/javan/whenever Craken https://github.com/latimes/craken @lmea #rubyday
  • 12. Whenever: schedule.rb # adds ">> /path/to/file.log 2>&1" to all commands set :output, /path/to/file.log every 3.hours do rake "my:rake:task" end every 1.day, :at => 4:30 am do runner "MyModel.task_to_run_at_four_thirty_in_the_morning" end every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => error.log, :standard => cron.log} end @lmea #rubyday
  • 13. Cracken: raketab 59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1 @daily solr:reindex > /tmp/solr_daily.log 2>&1 # also @yearly, @annually, @monthly, @weekly, @midnight, @hourly @lmea #rubyday
  • 14. Cracken: raketab.rb Raketab.new do |cron| cron.schedule thing:to_do > /tmp/thing_to_do.log 2>&1, :every => mon..fri cron.schedule first:five:days > /tmp/thing_to_do.log 2>&1, :days => [1,2,3,4,5] cron.schedule first:day:q1 > /tmp/thing_to_do.log 2>&1, :the => 1st, :in => [jan,feb,mar] cron.schedule first:day:q4 > /tmp/thing_to_do.log 2>&1, :the => 1st, :months => October,November,December end @lmea #rubyday
  • 15. Queues & Workers un-scheduled operations responding to a request mid to high frequency mixed run time @lmea #rubyday
  • 16. Queues & Workers Delayed job https://github.com/collectiveidea/delayed_job Resque https://github.com/defunkt/resque @lmea #rubyday
  • 17. Delayed job Any object method can be a job Db backed queue Integer-based priority Lifecycle hooks (enqueue, before, after, ... ) @lmea #rubyday
  • 18. Delayed job: simple jobs # without delayed_job @user.notify!(@event) # with delayed_job @user.delay.notify!(@event) # always asyncronous method class Newsletter def deliver # long running method end handle_asynchronously :deliver end newsletter = Newsletter.new newsletter.deliver @lmea #rubyday
  • 19. Delayed job: handle_asyncronously handle_asynchronously :sync_method, :priority => 20 handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now } handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run } handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important } @lmea #rubyday
  • 20. Delayed job class NewsletterJob < Struct.new(:text, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } end end Delayed::Job.enqueue NewsletterJob.new(lorem ipsum..., User.find(:all).collect(&:email)) @lmea #rubyday
  • 21. Delayed job RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start RAILS_ENV=production script/delayed_job stop rake jobs:work @lmea #rubyday
  • 22. Delayed job: checking the job status The queue is for scheduled and running jobs Handle the status outside Delayed::Job object @lmea #rubyday
  • 23. Delayed job: checking the job status # Include this in your initializers somewhere class Queue < Delayed::Job def self.status(id) self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure") end end # Use this method in your poll method like so: def poll status = Queue.status(params[:id]) if status == "success" # Success, notify the user! elsif status == "failure" # Failure, notify the user! end end @lmea #rubyday
  • 24. Delayed job: checking the job status class AJob < Struct.new(:options) def perform do_something(options) end def success(job) # record success of job.id Rails.cache.write("status:#{job.id}", "success") end end # a helper def job_completed_with_success(job_id) Rails.cache.read("status:#{job_id}")=="success" end @lmea #rubyday
  • 25. Resque Redis-backed queues Queue/dequeue speed independent of list size Forking behaviour Built in front-end Multiple queues / no priorities @lmea #rubyday
  • 26. Resque: the job class Export @queue = :export_jobs def self.perform(dataset_id, kind = full) ds = Dataset.find(dataset_id) ds.create_export(kind) end end @lmea #rubyday
  • 27. Resque: enqueuing the job class Dataset def async_create_export(kind) Resque.enqueue(Export, self.id, kind) end end ds = Dataset.find(100) ds.async_create_export(full) @lmea #rubyday
  • 28. Resque: persisting the job # jobs are persisted as JSON, # so jobs should only take arguments that can be expressed as JSON { class: Export, args: [ 100, full ] } # dont do this: Resque.enqueue(Export, self, kind) # do this: Resque.enqueue(Export, self.id, kind) @lmea #rubyday
  • 29. Resque: generic async methods # A simple async helper class Repository < ActiveRecord::Base # This will be called by a worker when a job needs to be processed def self.perform(id, method, *args) find(id).send(method, *args) end # We can pass this any Repository instance method that we want to # run later. def async(method, *args) Resque.enqueue(Repository, id, method, *args) end end # Now we can call any method and have it execute later: @repo.async(:update_disk_usage) @repo.async(:update_network_source_id, 34) @lmea #rubyday
  • 30. Resque: anatomy of a worker # a worker does this: start loop do if job = reserve job.process else sleep 5 end end shutdown @lmea #rubyday
  • 31. Resque: working the queues $ QUEUES=critical,high,low rake resque:work $ QUEUES=* rake resque:work $ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work task "resque:setup" => :environment do AppConfig.a_parameter = ... end @lmea #rubyday
  • 32. Resque: monit recipe # example monit monitoring recipe check process resque_worker_batch_01 with pidfile /app/current/tmp/pids/worker_01.pid start program = "/bin/bash -c cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid" as uid deploy and gid deploy stop program = "/bin/bash -c cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm -f tmp/pids/worker_01.pid; exit 0;" if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory? group resque_workers @lmea #rubyday
  • 33. Resque: built-in monitoring @lmea #rubyday
  • 34. Resque plugins Resque-status https://github.com/quirkey/resque-status Resque-scheduler https://github.com/bvandenbos/resque-scheduler/ More at: https://github.com/defunkt/resque/wiki/plugins @lmea #rubyday
  • 35. Resque-status Simple trackable jobs for resque Job instances have a UUID Jobs can report their status while running @lmea #rubyday
  • 36. Resque-status # inheriting from JobWithStatus class ExportJob < Resque::JobWithStatus # perform is an instance method def perform limit = options[limit].to_i || 1000 items = Item.limit(limit) total = items.count exported = [] items.each_with_index do |item, num| at(num, total, "At #{num} of #{total}") exported << item.to_csv end File.open(local_filename, w) { |f| f.write(exported.join("n")) } complete(:filename=>local_filename) end end @lmea #rubyday
  • 37. Resque-status job_id = SleepJob.create(:length => 100) status = Resque::Status.get(job_id) # the status object tell us: status.pct_complete #=> 0 status.status #=> queued status.queued? #=> true status.working? #=> false status.time #=> Time object status.message #=> "Created at ..." Resque::Status.kill(job_id) @lmea #rubyday
  • 38. Resque-scheduler Queueing for future execution Scheduling jobs (like cron!) @lmea #rubyday
  • 39. Resque-scheduler # run a job in 5 days Resque.enqueue_in(5.days, SendFollowupEmail) # run SomeJob at a specific time Resque.enqueue_at(5.days.from_now, SomeJob) @lmea #rubyday
  • 40. Resque-scheduler namespace :resque do task :setup do require resque require resque_scheduler require resque/scheduler Resque.redis = localhost:6379 # The schedule doesnt need to be stored in a YAML, it just needs to # be a hash. YAML is usually the easiest. Resque::Scheduler.schedule = YAML.load_file(your_resque_schedule.yml) # When dynamic is set to true, the scheduler process looks for # schedule changes and applies them on the fly. # Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule) # methods can be used to alter the schedule #Resque::Scheduler.dynamic = true end end $ rake resque:scheduler @lmea #rubyday
  • 41. Resque-scheduler: the yaml configuration queue_documents_for_indexing: cron: "0 0 * * *" class: QueueDocuments queue: high args: description: "This job queues all content for indexing in solr" export_items: cron: "30 6 * * 1" class: Export queue: low args: full description: "This job does a weekly export" @lmea #rubyday
  • 42. Other (commercial) SimpleWorker http://simpleworker.com SQS https://github.com/appoxy/aws/ http://rubygems.org/gems/right_aws http://sdruby.org/video/024_amazon_sqs.m4v @lmea #rubyday
  • 43. Other (historical) Beanstalkd and Stalker http://asciicasts.com/episodes/243-beanstalkd-and-stalker http://kr.github.com/beanstalkd/ https://github.com/han/stalker Backgroundjob (Bj) https://github.com/ahoward/bj BackgroundRb http://backgroundrb.rubyforge.org/ @lmea #rubyday
  • 44. Other (different approaches) Nanite http://www.slideshare.net/jendavis100/background-processing-with-nanite Cloud Crowd https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started @lmea #rubyday
  • 45. Ciao! me@spazidigitali.com @lmea #rubyday
  • 46. http://www.flickr.com/photos/rkbcupcakes/3373909785/http://www.flickr.com/photos/anjin/23460398http://www.flickr.com/photos/vivacomopuder/3122401239http://www.flickr.com/photos/pacdog/4968422200http://www.flickr.com/photos/comedynose/3834416952http://www.flickr.com/photos/rhysasplundh/5177851910/http://www.flickr.com/photos/marypcb/104308457http://www.flickr.com/photos/shutterhacks/4474421855http://www.flickr.com/photos/kevinschoenmakersnl/5562839479http://www.flickr.com/photos/triplexpresso/496995086http://www.flickr.com/photos/saxonmoseley/24523450http://www.flickr.com/photos/gadl/89650415http://www.flickr.com/photos/matvey_andreyev/3656451273http://www.flickr.com/photos/bryankennedy/1992770068http://www.flickr.com/photos/27282406@N03/4134661728/ @lmea #rubyday