Successfully reported this slideshow.
Your SlideShare is downloading. ×

To Batch Or Not To Batch

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 46 Ad

More Related Content

Slideshows for you (20)

Similar to To Batch Or Not To Batch (20)

Advertisement

Recently uploaded (20)

Advertisement

To Batch Or Not To Batch

  1. 1. To Batch Or Not To Batch Luca Mearelli rubyday.it 2011
  2. 2. First and foremost, we believe that speed is more than a feature. Speed is the most important feature. If your application is application is slow, people won’t use it. people won’t use it. Fred Wilson @lmea #rubyday
  3. 3. Not all the interesting features are fast Interacting with remote API Sending emails Media transcoding Large dataset handling @lmea #rubyday
  4. 4. Anatomy of an asynchronous action The app decides it needs to do a long operation The app asks the async system to do the operation and quickly returns the response The async system executes the operation out- of-band @lmea #rubyday
  5. 5. Batch Asynchronous jobs Queues & workers @lmea #rubyday
  6. 6. Batch @lmea #rubyday
  7. 7. Cron scheduled operations unrelated to the requests low frequency longer run time @lmea #rubyday
  8. 8. Anatomy of a cron batch: the rake task namespace :export do task :items_xml => :environment do # read the env variables # make the export end end @lmea #rubyday
  9. 9. Anatomy of a cron batch: the shell script #!/bin/sh # this goes in script/item_export_full.sh cd /usr/rails/MyApp/current export RAILS_ENV=production echo "Item Export Full started: `date`" rake export:items_xml XML_FOLDER='data/exports' echo "Item Export Full completed: `date`" @lmea #rubyday
  10. 10. Anatomy of a cron batch: the crontab entry 0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/ MyApp/current/log/dump_item_export.log 2>&1 30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/ runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/ newsletter_daily.log 2>&1 @lmea #rubyday
  11. 11. Cron helpers Whenever https://github.com/javan/whenever Craken https://github.com/latimes/craken @lmea #rubyday
  12. 12. Whenever: schedule.rb # adds ">> /path/to/file.log 2>&1" to all commands set :output, '/path/to/file.log' every 3.hours do rake "my:rake:task" end every 1.day, :at => '4:30 am' do runner "MyModel.task_to_run_at_four_thirty_in_the_morning" end every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => 'error.log', :standard => 'cron.log'} end @lmea #rubyday
  13. 13. Cracken: raketab 59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1 @daily solr:reindex > /tmp/solr_daily.log 2>&1 # also @yearly, @annually, @monthly, @weekly, @midnight, @hourly @lmea #rubyday
  14. 14. Cracken: raketab.rb Raketab.new do |cron| cron.schedule 'thing:to_do > /tmp/thing_to_do.log 2>&1', :every => mon..fri cron.schedule 'first:five:days > /tmp/thing_to_do.log 2>&1', :days => [1,2,3,4,5] cron.schedule 'first:day:q1 > /tmp/thing_to_do.log 2>&1', :the => '1st', :in => [jan,feb,mar] cron.schedule 'first:day:q4 > /tmp/thing_to_do.log 2>&1', :the => '1st', :months => 'October,November,December' end @lmea #rubyday
  15. 15. Queues & Workers un-scheduled operations responding to a request mid to high frequency mixed run time @lmea #rubyday
  16. 16. Queues & Workers Delayed job https://github.com/collectiveidea/delayed_job Resque https://github.com/defunkt/resque @lmea #rubyday
  17. 17. Delayed job Any object method can be a job Db backed queue Integer-based priority Lifecycle hooks (enqueue, before, after, ... ) @lmea #rubyday
  18. 18. Delayed job: simple jobs # without delayed_job @user.notify!(@event) # with delayed_job @user.delay.notify!(@event) # always asyncronous method class Newsletter def deliver # long running method end handle_asynchronously :deliver end newsletter = Newsletter.new newsletter.deliver @lmea #rubyday
  19. 19. Delayed job: handle_asyncronously handle_asynchronously :sync_method, :priority => 20 handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now } handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run } handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important } @lmea #rubyday
  20. 20. Delayed job class NewsletterJob < Struct.new(:text, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } end end Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', User.find(:all).collect(&:email)) @lmea #rubyday
  21. 21. Delayed job RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start RAILS_ENV=production script/delayed_job stop rake jobs:work @lmea #rubyday
  22. 22. Delayed job: checking the job status The queue is for scheduled and running jobs Handle the status outside Delayed::Job object @lmea #rubyday
  23. 23. Delayed job: checking the job status # Include this in your initializers somewhere class Queue < Delayed::Job def self.status(id) self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure") end end # Use this method in your poll method like so: def poll status = Queue.status(params[:id]) if status == "success" # Success, notify the user! elsif status == "failure" # Failure, notify the user! end end @lmea #rubyday
  24. 24. Delayed job: checking the job status class AJob < Struct.new(:options) def perform do_something(options) end def success(job) # record success of job.id Rails.cache.write("status:#{job.id}", "success") end end # a helper def job_completed_with_success(job_id) Rails.cache.read("status:#{job_id}")=="success" end @lmea #rubyday
  25. 25. Resque Redis-backed queues Queue/dequeue speed independent of list size Forking behaviour Built in front-end Multiple queues / no priorities @lmea #rubyday
  26. 26. Resque: the job class Export @queue = :export_jobs def self.perform(dataset_id, kind = 'full') ds = Dataset.find(dataset_id) ds.create_export(kind) end end @lmea #rubyday
  27. 27. Resque: enqueuing the job class Dataset def async_create_export(kind) Resque.enqueue(Export, self.id, kind) end end ds = Dataset.find(100) ds.async_create_export('full') @lmea #rubyday
  28. 28. Resque: persisting the job # jobs are persisted as JSON, # so jobs should only take arguments that can be expressed as JSON { 'class': 'Export', 'args': [ 100, 'full' ] } # don't do this: Resque.enqueue(Export, self, kind) # do this: Resque.enqueue(Export, self.id, kind) @lmea #rubyday
  29. 29. Resque: generic async methods # A simple async helper class Repository < ActiveRecord::Base # This will be called by a worker when a job needs to be processed def self.perform(id, method, *args) find(id).send(method, *args) end # We can pass this any Repository instance method that we want to # run later. def async(method, *args) Resque.enqueue(Repository, id, method, *args) end end # Now we can call any method and have it execute later: @repo.async(:update_disk_usage) @repo.async(:update_network_source_id, 34) @lmea #rubyday
  30. 30. Resque: anatomy of a worker # a worker does this: start loop do if job = reserve job.process else sleep 5 end end shutdown @lmea #rubyday
  31. 31. Resque: working the queues $ QUEUES=critical,high,low rake resque:work $ QUEUES=* rake resque:work $ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work task "resque:setup" => :environment do AppConfig.a_parameter = ... end @lmea #rubyday
  32. 32. Resque: monit recipe # example monit monitoring recipe check process resque_worker_batch_01 with pidfile /app/current/tmp/pids/worker_01.pid start program = "/bin/bash -c 'cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid'" as uid deploy and gid deploy stop program = "/bin/bash -c 'cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm -f tmp/pids/worker_01.pid; exit 0;'" if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory? group resque_workers @lmea #rubyday
  33. 33. Resque: built-in monitoring @lmea #rubyday
  34. 34. Resque plugins Resque-status https://github.com/quirkey/resque-status Resque-scheduler https://github.com/bvandenbos/resque-scheduler/ More at: https://github.com/defunkt/resque/wiki/plugins @lmea #rubyday
  35. 35. Resque-status Simple trackable jobs for resque Job instances have a UUID Jobs can report their status while running @lmea #rubyday
  36. 36. Resque-status # inheriting from JobWithStatus class ExportJob < Resque::JobWithStatus # perform is an instance method def perform limit = options['limit'].to_i || 1000 items = Item.limit(limit) total = items.count exported = [] items.each_with_index do |item, num| at(num, total, "At #{num} of #{total}") exported << item.to_csv end File.open(local_filename, 'w') { |f| f.write(exported.join("n")) } complete(:filename=>local_filename) end end @lmea #rubyday
  37. 37. Resque-status job_id = SleepJob.create(:length => 100) status = Resque::Status.get(job_id) # the status object tell us: status.pct_complete #=> 0 status.status #=> 'queued' status.queued? #=> true status.working? #=> false status.time #=> Time object status.message #=> "Created at ..." Resque::Status.kill(job_id) @lmea #rubyday
  38. 38. Resque-scheduler Queueing for future execution Scheduling jobs (like cron!) @lmea #rubyday
  39. 39. Resque-scheduler # run a job in 5 days Resque.enqueue_in(5.days, SendFollowupEmail) # run SomeJob at a specific time Resque.enqueue_at(5.days.from_now, SomeJob) @lmea #rubyday
  40. 40. Resque-scheduler namespace :resque do task :setup do require 'resque' require 'resque_scheduler' require 'resque/scheduler' Resque.redis = 'localhost:6379' # The schedule doesn't need to be stored in a YAML, it just needs to # be a hash. YAML is usually the easiest. Resque::Scheduler.schedule = YAML.load_file('your_resque_schedule.yml') # When dynamic is set to true, the scheduler process looks for # schedule changes and applies them on the fly. # Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule) # methods can be used to alter the schedule #Resque::Scheduler.dynamic = true end end $ rake resque:scheduler @lmea #rubyday
  41. 41. Resque-scheduler: the yaml configuration queue_documents_for_indexing: cron: "0 0 * * *" class: QueueDocuments queue: high args: description: "This job queues all content for indexing in solr" export_items: cron: "30 6 * * 1" class: Export queue: low args: full description: "This job does a weekly export" @lmea #rubyday
  42. 42. Other (commercial) SimpleWorker http://simpleworker.com SQS https://github.com/appoxy/aws/ http://rubygems.org/gems/right_aws http://sdruby.org/video/024_amazon_sqs.m4v @lmea #rubyday
  43. 43. Other (historical) Beanstalkd and Stalker http://asciicasts.com/episodes/243-beanstalkd-and-stalker http://kr.github.com/beanstalkd/ https://github.com/han/stalker Backgroundjob (Bj) https://github.com/ahoward/bj BackgroundRb http://backgroundrb.rubyforge.org/ @lmea #rubyday
  44. 44. Other (different approaches) Nanite http://www.slideshare.net/jendavis100/background-processing-with-nanite Cloud Crowd https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started @lmea #rubyday
  45. 45. Ciao! me@spazidigitali.com @lmea #rubyday
  46. 46. http://www.flickr.com/photos/rkbcupcakes/3373909785/ http://www.flickr.com/photos/anjin/23460398 http://www.flickr.com/photos/vivacomopuder/3122401239 http://www.flickr.com/photos/pacdog/4968422200 http://www.flickr.com/photos/comedynose/3834416952 http://www.flickr.com/photos/rhysasplundh/5177851910/ http://www.flickr.com/photos/marypcb/104308457 http://www.flickr.com/photos/shutterhacks/4474421855 http://www.flickr.com/photos/kevinschoenmakersnl/5562839479 http://www.flickr.com/photos/triplexpresso/496995086 http://www.flickr.com/photos/saxonmoseley/24523450 http://www.flickr.com/photos/gadl/89650415 http://www.flickr.com/photos/matvey_andreyev/3656451273 http://www.flickr.com/photos/bryankennedy/1992770068 http://www.flickr.com/photos/27282406@N03/4134661728/ @lmea #rubyday

×