To Batch Or Not To Batch
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

To Batch Or Not To Batch

on

  • 3,539 views

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.

Did this at rubyday.it 2011

Statistics

Views

Total Views
3,539
Views on SlideShare
3,439
Embed Views
100

Actions

Likes
4
Downloads
18
Comments
0

4 Embeds 100

http://luca.im.dev 83
http://luca.im 10
http://lanyrd.com 6
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

To Batch Or Not To Batch Presentation Transcript

  • 1. To Batch Or Not To Batch Luca Mearelli rubyday.it 2011
  • 2. First and foremost, we believe that speedis more than a feature. Speed is the mostimportant feature. If your application is application isslow, people won’t use it. people won’t use it.Fred Wilson @lmea #rubyday
  • 3. Not all the interesting features are fast Interacting with remote API Sending emails Media transcoding Large dataset handling @lmea #rubyday
  • 4. Anatomy of an asynchronous action The app decides it needs to do a long operation The app asks the async system to do the operation and quickly returns the response The async system executes the operation out- of-band @lmea #rubyday
  • 5. BatchAsynchronous jobsQueues & workers @lmea #rubyday
  • 6. Batch @lmea #rubyday
  • 7. Cron scheduled operations unrelated to the requests low frequency longer run time @lmea #rubyday
  • 8. Anatomy of a cron batch: the rake task namespace :export do task :items_xml => :environment do # read the env variables # make the export end end @lmea #rubyday
  • 9. Anatomy of a cron batch: the shell script #!/bin/sh # this goes in script/item_export_full.sh cd /usr/rails/MyApp/current export RAILS_ENV=production echo "Item Export Full started: `date`" rake export:items_xml XML_FOLDER=data/exports echo "Item Export Full completed: `date`" @lmea #rubyday
  • 10. Anatomy of a cron batch: the crontab entry 0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/ MyApp/current/log/dump_item_export.log 2>&1 30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/ runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/ newsletter_daily.log 2>&1 @lmea #rubyday
  • 11. Cron helpers Whenever https://github.com/javan/whenever Craken https://github.com/latimes/craken @lmea #rubyday
  • 12. Whenever: schedule.rb # adds ">> /path/to/file.log 2>&1" to all commands set :output, /path/to/file.log every 3.hours do rake "my:rake:task" end every 1.day, :at => 4:30 am do runner "MyModel.task_to_run_at_four_thirty_in_the_morning" end every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => error.log, :standard => cron.log} end @lmea #rubyday
  • 13. Cracken: raketab 59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1 @daily solr:reindex > /tmp/solr_daily.log 2>&1 # also @yearly, @annually, @monthly, @weekly, @midnight, @hourly @lmea #rubyday
  • 14. Cracken: raketab.rb Raketab.new do |cron| cron.schedule thing:to_do > /tmp/thing_to_do.log 2>&1, :every => mon..fri cron.schedule first:five:days > /tmp/thing_to_do.log 2>&1, :days => [1,2,3,4,5] cron.schedule first:day:q1 > /tmp/thing_to_do.log 2>&1, :the => 1st, :in => [jan,feb,mar] cron.schedule first:day:q4 > /tmp/thing_to_do.log 2>&1, :the => 1st, :months => October,November,December end @lmea #rubyday
  • 15. Queues & Workers un-scheduled operations responding to a request mid to high frequency mixed run time @lmea #rubyday
  • 16. Queues & Workers Delayed job https://github.com/collectiveidea/delayed_job Resque https://github.com/defunkt/resque @lmea #rubyday
  • 17. Delayed job Any object method can be a job Db backed queue Integer-based priority Lifecycle hooks (enqueue, before, after, ... ) @lmea #rubyday
  • 18. Delayed job: simple jobs # without delayed_job @user.notify!(@event) # with delayed_job @user.delay.notify!(@event) # always asyncronous method class Newsletter def deliver # long running method end handle_asynchronously :deliver end newsletter = Newsletter.new newsletter.deliver @lmea #rubyday
  • 19. Delayed job: handle_asyncronously handle_asynchronously :sync_method, :priority => 20 handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now } handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run } handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important } @lmea #rubyday
  • 20. Delayed job class NewsletterJob < Struct.new(:text, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } end end Delayed::Job.enqueue NewsletterJob.new(lorem ipsum..., User.find(:all).collect(&:email)) @lmea #rubyday
  • 21. Delayed job RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start RAILS_ENV=production script/delayed_job stop rake jobs:work @lmea #rubyday
  • 22. Delayed job: checking the job status The queue is for scheduled and running jobs Handle the status outside Delayed::Job object @lmea #rubyday
  • 23. Delayed job: checking the job status # Include this in your initializers somewhere class Queue < Delayed::Job def self.status(id) self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure") end end # Use this method in your poll method like so: def poll status = Queue.status(params[:id]) if status == "success" # Success, notify the user! elsif status == "failure" # Failure, notify the user! end end @lmea #rubyday
  • 24. Delayed job: checking the job status class AJob < Struct.new(:options) def perform do_something(options) end def success(job) # record success of job.id Rails.cache.write("status:#{job.id}", "success") end end # a helper def job_completed_with_success(job_id) Rails.cache.read("status:#{job_id}")=="success" end @lmea #rubyday
  • 25. Resque Redis-backed queues Queue/dequeue speed independent of list size Forking behaviour Built in front-end Multiple queues / no priorities @lmea #rubyday
  • 26. Resque: the job class Export @queue = :export_jobs def self.perform(dataset_id, kind = full) ds = Dataset.find(dataset_id) ds.create_export(kind) end end @lmea #rubyday
  • 27. Resque: enqueuing the job class Dataset def async_create_export(kind) Resque.enqueue(Export, self.id, kind) end end ds = Dataset.find(100) ds.async_create_export(full) @lmea #rubyday
  • 28. Resque: persisting the job # jobs are persisted as JSON, # so jobs should only take arguments that can be expressed as JSON { class: Export, args: [ 100, full ] } # dont do this: Resque.enqueue(Export, self, kind) # do this: Resque.enqueue(Export, self.id, kind) @lmea #rubyday
  • 29. Resque: generic async methods # A simple async helper class Repository < ActiveRecord::Base # This will be called by a worker when a job needs to be processed def self.perform(id, method, *args) find(id).send(method, *args) end # We can pass this any Repository instance method that we want to # run later. def async(method, *args) Resque.enqueue(Repository, id, method, *args) end end # Now we can call any method and have it execute later: @repo.async(:update_disk_usage) @repo.async(:update_network_source_id, 34) @lmea #rubyday
  • 30. Resque: anatomy of a worker # a worker does this: start loop do if job = reserve job.process else sleep 5 end end shutdown @lmea #rubyday
  • 31. Resque: working the queues $ QUEUES=critical,high,low rake resque:work $ QUEUES=* rake resque:work $ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work task "resque:setup" => :environment do AppConfig.a_parameter = ... end @lmea #rubyday
  • 32. Resque: monit recipe # example monit monitoring recipe check process resque_worker_batch_01 with pidfile /app/current/tmp/pids/worker_01.pid start program = "/bin/bash -c cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid" as uid deploy and gid deploy stop program = "/bin/bash -c cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm -f tmp/pids/worker_01.pid; exit 0;" if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory? group resque_workers @lmea #rubyday
  • 33. Resque: built-in monitoring @lmea #rubyday
  • 34. Resque plugins Resque-status https://github.com/quirkey/resque-status Resque-scheduler https://github.com/bvandenbos/resque-scheduler/ More at: https://github.com/defunkt/resque/wiki/plugins @lmea #rubyday
  • 35. Resque-status Simple trackable jobs for resque Job instances have a UUID Jobs can report their status while running @lmea #rubyday
  • 36. Resque-status # inheriting from JobWithStatus class ExportJob < Resque::JobWithStatus # perform is an instance method def perform limit = options[limit].to_i || 1000 items = Item.limit(limit) total = items.count exported = [] items.each_with_index do |item, num| at(num, total, "At #{num} of #{total}") exported << item.to_csv end File.open(local_filename, w) { |f| f.write(exported.join("n")) } complete(:filename=>local_filename) end end @lmea #rubyday
  • 37. Resque-status job_id = SleepJob.create(:length => 100) status = Resque::Status.get(job_id) # the status object tell us: status.pct_complete #=> 0 status.status #=> queued status.queued? #=> true status.working? #=> false status.time #=> Time object status.message #=> "Created at ..." Resque::Status.kill(job_id) @lmea #rubyday
  • 38. Resque-scheduler Queueing for future execution Scheduling jobs (like cron!) @lmea #rubyday
  • 39. Resque-scheduler # run a job in 5 days Resque.enqueue_in(5.days, SendFollowupEmail) # run SomeJob at a specific time Resque.enqueue_at(5.days.from_now, SomeJob) @lmea #rubyday
  • 40. Resque-scheduler namespace :resque do task :setup do require resque require resque_scheduler require resque/scheduler Resque.redis = localhost:6379 # The schedule doesnt need to be stored in a YAML, it just needs to # be a hash. YAML is usually the easiest. Resque::Scheduler.schedule = YAML.load_file(your_resque_schedule.yml) # When dynamic is set to true, the scheduler process looks for # schedule changes and applies them on the fly. # Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule) # methods can be used to alter the schedule #Resque::Scheduler.dynamic = true end end $ rake resque:scheduler @lmea #rubyday
  • 41. Resque-scheduler: the yaml configuration queue_documents_for_indexing: cron: "0 0 * * *" class: QueueDocuments queue: high args: description: "This job queues all content for indexing in solr" export_items: cron: "30 6 * * 1" class: Export queue: low args: full description: "This job does a weekly export" @lmea #rubyday
  • 42. Other (commercial) SimpleWorker http://simpleworker.com SQS https://github.com/appoxy/aws/ http://rubygems.org/gems/right_aws http://sdruby.org/video/024_amazon_sqs.m4v @lmea #rubyday
  • 43. Other (historical) Beanstalkd and Stalker http://asciicasts.com/episodes/243-beanstalkd-and-stalker http://kr.github.com/beanstalkd/ https://github.com/han/stalker Backgroundjob (Bj) https://github.com/ahoward/bj BackgroundRb http://backgroundrb.rubyforge.org/ @lmea #rubyday
  • 44. Other (different approaches) Nanite http://www.slideshare.net/jendavis100/background-processing-with-nanite Cloud Crowd https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started @lmea #rubyday
  • 45. Ciao! me@spazidigitali.com @lmea #rubyday
  • 46. http://www.flickr.com/photos/rkbcupcakes/3373909785/http://www.flickr.com/photos/anjin/23460398http://www.flickr.com/photos/vivacomopuder/3122401239http://www.flickr.com/photos/pacdog/4968422200http://www.flickr.com/photos/comedynose/3834416952http://www.flickr.com/photos/rhysasplundh/5177851910/http://www.flickr.com/photos/marypcb/104308457http://www.flickr.com/photos/shutterhacks/4474421855http://www.flickr.com/photos/kevinschoenmakersnl/5562839479http://www.flickr.com/photos/triplexpresso/496995086http://www.flickr.com/photos/saxonmoseley/24523450http://www.flickr.com/photos/gadl/89650415http://www.flickr.com/photos/matvey_andreyev/3656451273http://www.flickr.com/photos/bryankennedy/1992770068http://www.flickr.com/photos/27282406@N03/4134661728/ @lmea #rubyday