Distributed
Ruby and Rails
      @ihower
   http://ihower.tw
        2010/1
About Me
•           a.k.a. ihower
    • http://ihower.tw
    • http://twitter.com/ihower
    • http://github.com/ihower
•...
Agenda
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA f...
1.Distributed Ruby

• DRb
• Rinda
• Starfish
• MapReduce
• MagLev VM
DRb

• Ruby's RMI                 system
             (remote method invocation)


• an object in one Ruby process can inv...
DRb (cont.)

• no defined interface, faster development time
• tightly couple applications, because no
  defined API, but ra...
server example 1
require 'drb'

class HelloWorldServer

      def say_hello
          'Hello, world!'
      end

end

DRb....
client example 1
require 'drb'

server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

puts server.say_hello
puts ser...
example 2
# user.rb
class User

  attr_accessor :username

end
server example 2
require 'drb'
require 'user'

class UserServer

  attr_accessor :users

  def find(id)
    self.users[id-...
client example 2
require 'drb'

user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

user = user_server.find(2...
Err...

# <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo:
tUser006:016@usernameia">
# client2.rb:8: undefined meth...
Why? DRbUndumped
•   Default DRb operation

    • Pass by value
    • Must share code
• With DRbUndumped
 • Pass by refere...
Example 2 Fixed
# user.rb
class User

  include DRbUndumped

  attr_accessor :username

end

# <DRb::DRbObject:0x1003b84f8...
Why use DRbUndumped?

 • Big objects
 • Singleton objects
 • Lightweight clients
 • Rapidly changing software
ID conversion
• Converts reference into DRb object on server
 • DRbIdConv (Default)
 • TimerIdConv
 • NamedIdConv
 • GWIdC...
Beware of garbage
         collection
•   referenced objects may be collected on
    server (usually doesn't matter)
•   B...
DRb security
require 'drb'

ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")
class << ro
    undef :instance_eval
en...
$SAFE=1

instance_eval':
Insecure operation - instance_eval (SecurityError)
DRb security (cont.)

• Access Control Lists (ACLs)
 • via IP address array
 • still can run denial-of-service attack
• DR...
Rinda

• Rinda is a Ruby port of Linda distributed
    computing paradigm.
•   Linda is a model of coordination and commun...
Rinda (cont.)

• Rinda consists of:
 • a TupleSpace implementation
 • a RingServer that allows DRb services to
    automat...
RingServer

• We hardcoded IP addresses in DRb
  program, it’s tight coupling of applications
  and make fault tolerance d...
1. Where Service X?

                                                          RingServer
                                ...
ring server example
require 'rinda/ring'
require 'rinda/tuplespace'

DRb.start_service
Rinda::RingServer.new(Rinda::TupleS...
service example
require 'rinda/ring'

class HelloWorldServer
    include DRbUndumped # Need for RingServer

      def say_...
client example
require 'rinda/ring'

DRb.start_service
ring_server = Rinda::RingFinger.primary

service = ring_server.read...
TupleSpaces

• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
  [:name, :Class, object...
5 Basic Operations

• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)
Starfish

• Starfish is a utility to make distributed
  programming ridiculously easy
• It runs both the server and the clie...
starfish foo.rb
# foo.rb

class Foo
  attr_reader :i

  def initialize
    @i = 0
  end

  def inc
    logger.info "YAY it ...
starfish server example
   ARGV.unshift('server.rb')

   require 'rubygems'
   require 'starfish'

   class HelloWorld
    ...
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lam...
starfish client example                 (another way)


       ARGV.unshift('server.rb')

       require 'rubygems'
       ...
MapReduce

• introduced by Google to support
  distributed computing on large data sets on
  clusters of computers.
• insp...
starfish server example
ARGV.unshift('server.rb')

require 'rubygems'
require 'starfish'

Starfish.server = lambda{ |map_re...
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lam...
Other implementations
• Skynet
 • Use TupleSpace or MySQL as message queue
 • Include an extension for ActiveRecord
 • htt...
MagLev VM

• a fast, stable, Ruby implementation with
  integrated object persistence and
  distributed shared cache.
• ht...
2.Distributed Message
       Queues

• Starling
• AMQP/RabbitMQ
• Stomp/ActiveMQ
• beanstalkd
what’s message queue?
          Message X
 Client                Queue



                      Check and processing




 ...
Why not DRb?

• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
  distribu...
Starling
• a light-weight persistent queue server that
  speaks the Memcache protocol (mimics its
  API)
• Fast, effective...
Starling command

• sudo gem install starling-starling
 • http://github.com/starling/starling
• sudo starling -h 192.168.1...
Starling set example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.1.4:22122')

100.times do |i|...
Starling get example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.2.4:22122')

loop do
  puts s...
get method
• FIFO
• After get, the object is no longer in the
  queue. You will lost message if processing
  error happene...
Handle processing
 error exception
 require 'rubygems'
 require 'starling'

 starling = Starling.new('192.168.2.4:22122')
...
Starling cons

• Poll queue constantly
• RabbitMQ can subscribe to a queue that
  notify you when a message is available f...
AMQP/RabbitMQ
• a complete and highly reliable enterprise
  messaging system based on the emerging
  AMQP standard.
  • Er...
Stomp/ActiveMQ

• Apache ActiveMQ is the most popular and
  powerful open source messaging and
  Integration Patterns prov...
beanstalkd
• Beanstalk is a simple, fast workqueue
  service. Its interface is generic, but was
  originally designed for ...
Why we need asynchronous/
 background-processing in Rails?

• cron-like processing
  text search index update etc)
       ...
3.Background-
   processing for Rails
• script/runner
• rake
• cron
• daemon
• run_later plugin
• spawn plugin
script/runner


• In Your Rails App root:
• script/runner “Worker.process”
rake

• In RAILS_ROOT/lib/tasks/dev.rake
• rake dev:process
  namespace :dev do
    task :process do
          #...
    en...
cron

• Cron is a time-based job scheduler in Unix-
  like computer operating systems.
• crontab -e
Whenever
          http://github.com/javan/whenever

•   A Ruby DSL for Defining Cron Jobs

• http://asciicasts.com/episode...
Daemon

• http://daemons.rubyforge.org/
• http://github.com/dougal/daemon_generator/
rufus-scheduler
   http://github.com/jmettraux/rufus-scheduler


• scheduling pieces of code (jobs)
• Not replacement for ...
Daemon Kit
   http://github.com/kennethkalmer/daemon-kit



• Creating Ruby daemons by providing a
  sound application ske...
Monitor your daemon

• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/
daemon_controller
http://github.com/FooBarWidget/daemon_controller




• A library for robust daemon management
• Make dae...
off-load task via system
       command
# mailings_controller.rb
def deliver
  call_rake :send_mailing, :mailing_id => par...
Simple Thread

after_filter do
    Thread.new do
        AccountMailer.deliver_signup(@user)
    end
end
run_later plugin
      http://github.com/mattmatt/run_later


• Borrowed from Merb
• Uses worker thread and a queue
• Simp...
spawn plugin
http://github.com/tra/spawn


  spawn do
    logger.info("I feel sleepy...")
    sleep 11
    logger.info("Ti...
spawn (cont.)
• By default, spawn will use the fork to spawn
  child processes.You can configure it to do
  threading.
• Wo...
threading vs. forking
•   Forking advantages:
    •   more reliable? - the ActiveRecord code is not thread-safe.
    •   k...
Okay, we need
    reliable messaging system:
•   Persistent
•   Scheduling: not necessarily all at the same time
•   Scala...
4.Message Queues
     (for Rails only)
• ar_mailer
• BackgroundDRb
• workling
• delayed_job
• resque
Rails only?

• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment
ar_mailer
       http://seattlerb.rubyforge.org/ar_mailer/



• a two-phase delivery agent for ActionMailer.
 • Store mess...
BackgroundDRb
            http://backgroundrb.rubyforge.org/

• BackgrounDRb is a Ruby job server and
  scheduler.
• Have ...
workling
     http://github.com/purzelrakete/workling




• Gives your Rails App a simple API that you
  can use to make c...
Workling/Starling
         setup
• script/plugin install git://github.com/purzelrae/
  workling.git
• sudo starling -p 151...
Workling example
 class EmailWorker < Workling::Base
   def deliver(options)
     user = User.find(options[:id])
     user...
delayed_job
• Database backed asynchronous priority
  queue
• Extracted from Shopify
• you can place any Ruby object on it...
delayed_job setup
                (use fork version)




• script/plugin install git://github.com/
  collectiveidea/delaye...
delayed_job example
     send_later
def deliver
  mailing = Mailing.find(params[:id])
  mailing.send_later(:deliver)
  fla...
delayed_job example
  custom workers
class MailingJob < Struct.new(:mailing_id)

  def perform
    mailing = Mailing.find(...
delayed_job example
       always asynchronously


   class Device
     def deliver
       # long running method
     end
...
Running jobs

• rake jobs:works
  (Don’t use in production, it will exit if the database has any network connectivity
  pr...
Priority
                  just Integer, default is 0

• you can run multipie workers to handle different
  priority jobs
...
Scheduled
        no guarantees at precise time, just run_after_at



Delayed::Job.enqueue(MailingJob.new(params[:id]), 3,...
Configuring Dealyed
        Job
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Del...
Automatic retry on failure
 • If a method throws an exception it will be
   caught and the method rerun later.
 • The meth...
Capistrano Recipes
• Remember to restart delayed_job after
  deployment
• Check out lib/delayed_job/recipes.rb
   after "d...
Resque
             http://github.com/defunkt/resque

•   a Redis-backed library for creating background jobs,
    placing...
My recommendations:

• General purpose: delayed_job
  (Github highly recommend DelayedJob to anyone whose site is not 50% ...
5. SOA for Rails

• What’s SOA
• Why SOA
• Considerations
• The tool set
What’s SOA
           Service oriented architectures



• “monolithic” approach is not enough
• SOA is a way to design com...
a monolithic web app example
                 request




             Load
            Balancer




            WebApps

...
a SOA example
                                     request




                                 Load
       request
      ...
Why SOA? Isolation
• Shared Resources
• Encapsulation
• Scalability
• Interoperability
• Reuse
• Testability
• Reduce Loca...
Shared Resources
• Different front-web website use the same
  resource.
• SOA help you avoiding duplication databases
  an...
Encapsulation

• you can change underly implementation in
  services without affect other parts of system
 • upgrade libra...
Scalability1: Partitioned
     Data Provides
•   Database is the first bottleneck, a single DB
    server can not scale. SO...
Scalability 2: Caching

• SOA help you design caching system easier
 • Cache data at the right times and expire
    at the...
Scalability 3: Efficient
• Different components have different task
  loading, SOA can scale by service.

                 ...
Security

• Different services can be inside different
  firewall
  • You can only open public web and
    services, others...
Interoperability
• HTTP is the common interface, SOA help
  you integrate them:
 • Multiple languages
 • Internal system e...
Reuse

• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)
Testability

• Isolate problem
• Mocking API calls
 • Reduce the time to run test suite
Reduce Local
         Complexity
• Team modularity along the same module
  splits as your software
• Understandability: Th...
Considerations

• Partition into Separate Services
• API Design
• Which Protocol
How to partition into
 Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Par...
API Design

• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
Physical Models &
     Logical Models
• Physical models are mapped to database
  tables through ORM. (It’s 3NF)
• Logical ...
Logical Models
• Not relational or normalized
• Maintainability
  • can change with no change to data store
  • can stay t...
Which Protocol?

• SOAP
• XML-RPC
• REST
RESTful Web services

• Rails way
• REST is about resources
 • URL
 • Verbs: GET/PUT/POST/DELETE
The tool set

• Web framework
• XML Parser
• JSON Parser
• HTTP Client
Web framework

• We do not need controller, view too much
• Rails is a little more, how about Sinatra?
• Rails metal
ActiveResource

• Mapping RESTful resources as models in a
  Rails application.
• But not useful in practice, why?
XML parser

• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
  Reader parser. Among Nokogiri’s many
  featu...
JSON Parser

• http://github.com/brianmario/yajl-ruby/
• An extremely efficient streaming JSON
  parsing and encoding libra...
HTTP Client


• http://github.com/pauldix/typhoeus/
• Typhoeus runs HTTP requests in parallel
  while cleanly encapsulatin...
Tips

• Define your logical model (i.e. your service
  request result) first.

• model.to_json and model.to_xml is easy to
 ...
6.Distributed File System
 •   NFS not scale
     •   we can use rsync to duplicate
 •   MogileFS
     •   http://www.dang...
7.Distributed Database

• NoSQL
• CAP theorem
 • Eventually consistent
• HBase/Cassandra/Voldemort
The End
References
•   Books&Articles:
    •    Distributed Programming with Ruby, Mark Bates (Addison Wesley)
    •    Enterprise...
References
•   Links:
    •   http://segment7.net/projects/ruby/drb/
    •   http://www.slideshare.net/luccastera/concurre...
Todo (maybe next time)
•   AMQP/RabbitMQ example code
    •   How about Nanite?
•   XMPP
•   MagLev VM
•   More MapReduce ...
Upcoming SlideShare
Loading in...5
×

Distributed Ruby and Rails

22,446

Published on

Published in: Technology
5 Comments
75 Likes
Statistics
Notes
No Downloads
Views
Total Views
22,446
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
629
Comments
5
Likes
75
Embeds 0
No embeds

No notes for slide

Distributed Ruby and Rails

  1. 1. Distributed Ruby and Rails @ihower http://ihower.tw 2010/1
  2. 2. About Me • a.k.a. ihower • http://ihower.tw • http://twitter.com/ihower • http://github.com/ihower • Ruby on Rails Developer since 2006 • Ruby Taiwan Community • http://ruby.tw
  3. 3. Agenda • Distributed Ruby • Distributed Message Queues • Background-processing in Rails • Message Queues for Rails • SOA for Rails • Distributed Filesystem • Distributed database
  4. 4. 1.Distributed Ruby • DRb • Rinda • Starfish • MapReduce • MagLev VM
  5. 5. DRb • Ruby's RMI system (remote method invocation) • an object in one Ruby process can invoke methods on an object in another Ruby process on the same or a different machine
  6. 6. DRb (cont.) • no defined interface, faster development time • tightly couple applications, because no defined API, but rather method on objects • unreliable under large-scale, heavy loads production environments
  7. 7. server example 1 require 'drb' class HelloWorldServer def say_hello 'Hello, world!' end end DRb.start_service("druby://127.0.0.1:61676", HelloWorldServer.new) DRb.thread.join
  8. 8. client example 1 require 'drb' server = DRbObject.new_with_uri("druby://127.0.0.1:61676") puts server.say_hello puts server.inspect # Hello, world! # <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby:// 127.0.0.1:61676">
  9. 9. example 2 # user.rb class User attr_accessor :username end
  10. 10. server example 2 require 'drb' require 'user' class UserServer attr_accessor :users def find(id) self.users[id-1] end end user_server = UserServer.new user_server.users = [] 5.times do |i| user = User.new user.username = i + 1 user_server.users << user end DRb.start_service("druby://127.0.0.1:61676", user_server) DRb.thread.join
  11. 11. client example 2 require 'drb' user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676") user = user_server.find(2) puts user.inspect puts "Username: #{user.username}" user.name = "ihower" puts "Username: #{user.username}"
  12. 12. Err... # <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo: tUser006:016@usernameia"> # client2.rb:8: undefined method `username' for #<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
  13. 13. Why? DRbUndumped • Default DRb operation • Pass by value • Must share code • With DRbUndumped • Pass by reference • No need to share code
  14. 14. Example 2 Fixed # user.rb class User include DRbUndumped attr_accessor :username end # <DRb::DRbObject:0x1003b84f8 @ref=2149433940, @uri="druby://127.0.0.1:61676"> # Username: 2 # Username: ihower
  15. 15. Why use DRbUndumped? • Big objects • Singleton objects • Lightweight clients • Rapidly changing software
  16. 16. ID conversion • Converts reference into DRb object on server • DRbIdConv (Default) • TimerIdConv • NamedIdConv • GWIdConv
  17. 17. Beware of garbage collection • referenced objects may be collected on server (usually doesn't matter) • Building Your own ID Converter if you want to control persistent state.
  18. 18. DRb security require 'drb' ro = DRbObject.new_with_uri("druby://127.0.0.1:61676") class << ro undef :instance_eval end # !!!!!!!! WARNING !!!!!!!!! DO NOT RUN ro.instance_eval("`rm -rf *`")
  19. 19. $SAFE=1 instance_eval': Insecure operation - instance_eval (SecurityError)
  20. 20. DRb security (cont.) • Access Control Lists (ACLs) • via IP address array • still can run denial-of-service attack • DRb over SSL
  21. 21. Rinda • Rinda is a Ruby port of Linda distributed computing paradigm. • Linda is a model of coordination and communication among several parallel processes operating upon objects stored in and retrieved from shared, virtual, associative memory. This model is implemented as a "coordination language" in which several primitives operating on ordered sequence of typed data objects, "tuples," are added to a sequential language, such as C, and a logically global associative memory, called a tuplespace, in which processes store and retrieve tuples. (WikiPedia)
  22. 22. Rinda (cont.) • Rinda consists of: • a TupleSpace implementation • a RingServer that allows DRb services to automatically discover each other.
  23. 23. RingServer • We hardcoded IP addresses in DRb program, it’s tight coupling of applications and make fault tolerance difficult. • RingServer can detect and interact with other services on the network without knowing IP addresses.
  24. 24. 1. Where Service X? RingServer via broadcast UDP address 2. Service X: 192.168.1.12 Client @192.1681.100 3. Hi, Service X @ 192.168.1.12 Service X @ 192.168.1.12 4. Hi There 192.168.1.100
  25. 25. ring server example require 'rinda/ring' require 'rinda/tuplespace' DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) DRb.thread.join
  26. 26. service example require 'rinda/ring' class HelloWorldServer include DRbUndumped # Need for RingServer def say_hello 'Hello, world!' end end DRb.start_service ring_server = Rinda::RingFinger.primary ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new, 'I like to say hi!'], Rinda::SimpleRenewer.new) DRb.thread.join
  27. 27. client example require 'rinda/ring' DRb.start_service ring_server = Rinda::RingFinger.primary service = ring_server.read([:hello_world_service, nil,nil,nil]) server = service[2] puts server.say_hello puts service.inspect # Hello, world! # [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650 @uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like to say hi!"]
  28. 28. TupleSpaces • Shared object space • Atomic access • Just like bulletin board • Tuple template is [:name, :Class, object, ‘description’ ]
  29. 29. 5 Basic Operations • write • read • take (Atomic Read+Delete) • read_all • notify (Callback for write/take/delete)
  30. 30. Starfish • Starfish is a utility to make distributed programming ridiculously easy • It runs both the server and the client in infinite loops • MapReduce with ActiveRecode or Files
  31. 31. starfish foo.rb # foo.rb class Foo attr_reader :i def initialize @i = 0 end def inc logger.info "YAY it incremented by 1 up to #{@i}" @i += 1 end end server :log => "foo.log" do |object| object = Foo.new end client do |object| object.inc end
  32. 32. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' class HelloWorld def say_hi 'Hi There' end end Starfish.server = lambda do |object| object = HelloWorld.new end Starfish.new('hello_world').server
  33. 33. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda do |object| puts object.say_hi exit(0) # exit program immediately end Starfish.new('hello_world').client
  34. 34. starfish client example (another way) ARGV.unshift('server.rb') require 'rubygems' require 'starfish' catch(:halt) do Starfish.client = lambda do |object| puts object.say_hi throw :halt end Starfish.new ('hello_world').client end puts "bye bye"
  35. 35. MapReduce • introduced by Google to support distributed computing on large data sets on clusters of computers. • inspired by map and reduce functions commonly used in functional programming.
  36. 36. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' Starfish.server = lambda{ |map_reduce| map_reduce.type = File map_reduce.input = "/var/log/apache2/access.log" map_reduce.queue_size = 10 map_reduce.lines_per_client = 5 map_reduce.rescan_when_complete = false } Starfish.new('log_server').server
  37. 37. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda { |logs| logs.each do |log| puts "Processing #{log}" sleep(1) end } Starfish.new("log_server").client
  38. 38. Other implementations • Skynet • Use TupleSpace or MySQL as message queue • Include an extension for ActiveRecord • http://skynet.rubyforge.org/ • MRToolkit based on Hadoop • http://code.google.com/p/mrtoolkit/
  39. 39. MagLev VM • a fast, stable, Ruby implementation with integrated object persistence and distributed shared cache. • http://maglev.gemstone.com/ • public Alpha currently
  40. 40. 2.Distributed Message Queues • Starling • AMQP/RabbitMQ • Stomp/ActiveMQ • beanstalkd
  41. 41. what’s message queue? Message X Client Queue Check and processing Processor
  42. 42. Why not DRb? • DRb has security risk and poorly designed APIs • distributed message queue is a great way to do distributed programming: reliable and scalable.
  43. 43. Starling • a light-weight persistent queue server that speaks the Memcache protocol (mimics its API) • Fast, effective, quick setup and ease of use • Powered by EventMachine http://eventmachine.rubyforge.org/EventMachine.html • Twitter’s open source project, they use it before 2009. (now switch to Kestrel, a port of Starling from Ruby to Scala)
  44. 44. Starling command • sudo gem install starling-starling • http://github.com/starling/starling • sudo starling -h 192.168.1.100 • sudo starling_top -h 192.168.1.100
  45. 45. Starling set example require 'rubygems' require 'starling' starling = Starling.new('192.168.1.4:22122') 100.times do |i| starling.set('my_queue', i) end append to the queue, not overwrite in Memcached
  46. 46. Starling get example require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') loop do puts starling.get("my_queue") end
  47. 47. get method • FIFO • After get, the object is no longer in the queue. You will lost message if processing error happened. • The get method blocks until something is returned. It’s infinite loop.
  48. 48. Handle processing error exception require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') results = starling.get("my_queue") begin puts results.flatten rescue NoMethodError => e puts e.message Starling.set("my_queue", [results]) rescue Exception => e Starling.set("my_queue", results) raise e end
  49. 49. Starling cons • Poll queue constantly • RabbitMQ can subscribe to a queue that notify you when a message is available for processing.
  50. 50. AMQP/RabbitMQ • a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. • Erlang • http://github.com/tmm1/amqp • Powered by EventMachine
  51. 51. Stomp/ActiveMQ • Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns provider. • sudo gem install stomp • ActiveMessaging plugin for Rails
  52. 52. beanstalkd • Beanstalk is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously. • http://kr.github.com/beanstalkd/ • http://beanstalk.rubyforge.org/ • Facebook’s open source project
  53. 53. Why we need asynchronous/ background-processing in Rails? • cron-like processing text search index update etc) (compute daily statistics data, create reports, Full- • long-running tasks (sending mail, resizing photo’s, encoding videos, generate PDF, image upload to S3, posting something to twitter etc) • Server traffic jam: expensive request will block server resources(i.e. your Rails app) • Bad user experience: they maybe try to reload and reload again! (responsive matters)
  54. 54. 3.Background- processing for Rails • script/runner • rake • cron • daemon • run_later plugin • spawn plugin
  55. 55. script/runner • In Your Rails App root: • script/runner “Worker.process”
  56. 56. rake • In RAILS_ROOT/lib/tasks/dev.rake • rake dev:process namespace :dev do task :process do #... end end
  57. 57. cron • Cron is a time-based job scheduler in Unix- like computer operating systems. • crontab -e
  58. 58. Whenever http://github.com/javan/whenever • A Ruby DSL for Defining Cron Jobs • http://asciicasts.com/episodes/164-cron-in-ruby • or http://cronedit.rubyforge.org/ every 3.hours do runner "MyModel.some_process" rake "my:rake:task" command "/usr/bin/my_great_command" end
  59. 59. Daemon • http://daemons.rubyforge.org/ • http://github.com/dougal/daemon_generator/
  60. 60. rufus-scheduler http://github.com/jmettraux/rufus-scheduler • scheduling pieces of code (jobs) • Not replacement for cron/at since it runs inside of Ruby. require 'rubygems' require 'rufus/scheduler' scheduler = Rufus::Scheduler.start_new scheduler.every '5s' do puts 'check blood pressure' end scheduler.join
  61. 61. Daemon Kit http://github.com/kennethkalmer/daemon-kit • Creating Ruby daemons by providing a sound application skeleton (through a generator), task specific generators (jabber bot, etc) and robust environment management code.
  62. 62. Monitor your daemon • http://mmonit.com/monit/ • http://github.com/arya/bluepill • http://god.rubyforge.org/
  63. 63. daemon_controller http://github.com/FooBarWidget/daemon_controller • A library for robust daemon management • Make daemon-dependent applications Just Work without having to start the daemons manually.
  64. 64. off-load task via system command # mailings_controller.rb def deliver call_rake :send_mailing, :mailing_id => params[:id].to_i flash[:notice] = "Delivering mailing" redirect_to mailings_url end # controllers/application.rb def call_rake(task, options = {}) options[:rails_env] ||= Rails.env args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" } system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &" end # lib/tasks/mailer.rake desc "Send mailing" task :send_mailing => :environment do mailing = Mailing.find(ENV["MAILING_ID"]) mailing.deliver end # models/mailing.rb def deliver sleep 10 # placeholder for sending email update_attribute(:delivered_at, Time.now) end
  65. 65. Simple Thread after_filter do Thread.new do AccountMailer.deliver_signup(@user) end end
  66. 66. run_later plugin http://github.com/mattmatt/run_later • Borrowed from Merb • Uses worker thread and a queue • Simple solution for simple tasks run_later do AccountMailer.deliver_signup(@user) end
  67. 67. spawn plugin http://github.com/tra/spawn spawn do logger.info("I feel sleepy...") sleep 11 logger.info("Time to wake up!") end
  68. 68. spawn (cont.) • By default, spawn will use the fork to spawn child processes.You can configure it to do threading. • Works by creating new database connections in ActiveRecord::Base for the spawned block. • Fock need copy Rails every time
  69. 69. threading vs. forking • Forking advantages: • more reliable? - the ActiveRecord code is not thread-safe. • keep running - subprocess can live longer than its parent. • easier - just works with Rails default settings. Threading requires you set allow_concurrency=true and. Also, beware of automatic reloading of classes in development mode (config.cache_classes = false). • Threading advantages: • less filling - threads take less resources... how much less? it depends. • debugging - you can set breakpoints in your threads
  70. 70. Okay, we need reliable messaging system: • Persistent • Scheduling: not necessarily all at the same time • Scalability: just throw in more instances of your program to speed up processing • Loosely coupled components that merely ‘talk’ to each other • Ability to easily replace Ruby with something else for specific tasks • Easy to debug and monitor
  71. 71. 4.Message Queues (for Rails only) • ar_mailer • BackgroundDRb • workling • delayed_job • resque
  72. 72. Rails only? • Easy to use/write code • Jobs are Ruby classes or objects • But need to load Rails environment
  73. 73. ar_mailer http://seattlerb.rubyforge.org/ar_mailer/ • a two-phase delivery agent for ActionMailer. • Store messages into the database • Delivery by a separate process, ar_sendmail later.
  74. 74. BackgroundDRb http://backgroundrb.rubyforge.org/ • BackgrounDRb is a Ruby job server and scheduler. • Have scalability problem due to Mark Bates) (~20 servers for • Hard to know if processing error • Use database to persist tasks • Use memcached to know processing result
  75. 75. workling http://github.com/purzelrakete/workling • Gives your Rails App a simple API that you can use to make code run in the background, outside of the your request. • Supports Starling(default), BackgroundJob, Spawn and AMQP/RabbitMQ Runners.
  76. 76. Workling/Starling setup • script/plugin install git://github.com/purzelrae/ workling.git • sudo starling -p 15151 • RAILS_ENV=production script/ workling_client start
  77. 77. Workling example class EmailWorker < Workling::Base def deliver(options) user = User.find(options[:id]) user.deliver_activation_email end end # in your controller def create EmailWorker.asynch_deliver( :id => 1) end
  78. 78. delayed_job • Database backed asynchronous priority queue • Extracted from Shopify • you can place any Ruby object on its queue as arguments • Only load the Rails environment only once
  79. 79. delayed_job setup (use fork version) • script/plugin install git://github.com/ collectiveidea/delayed_job.git • script/generate delayed_job • rake db:migrate
  80. 80. delayed_job example send_later def deliver mailing = Mailing.find(params[:id]) mailing.send_later(:deliver) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  81. 81. delayed_job example custom workers class MailingJob < Struct.new(:mailing_id) def perform mailing = Mailing.find(mailing_id) mailing.deliver end end # in your controller def deliver Delayed::Job.enqueue(MailingJob.new(params[:id])) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  82. 82. delayed_job example always asynchronously class Device def deliver # long running method end handle_asynchronously :deliver end device = Device.new device.deliver
  83. 83. Running jobs • rake jobs:works (Don’t use in production, it will exit if the database has any network connectivity problems.) • RAILS_ENV=production script/delayed_job start • RAILS_ENV=production script/delayed_job stop
  84. 84. Priority just Integer, default is 0 • you can run multipie workers to handle different priority jobs • RAILS_ENV=production script/delayed_job -min- priority 3 start Delayed::Job.enqueue(MailingJob.new(params[:id]), 3) Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
  85. 85. Scheduled no guarantees at precise time, just run_after_at Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now) Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 1.month.from_now.beginning_of_month)
  86. 86. Configuring Dealyed Job # config/initializers/delayed_job_config.rb Delayed::Worker.destroy_failed_jobs = false Delayed::Worker.sleep_delay = 5 # sleep if empty queue Delayed::Worker.max_attempts = 25 Delayed::Worker.max_run_time = 4.hours # set to the amount of time of longest task will take
  87. 87. Automatic retry on failure • If a method throws an exception it will be caught and the method rerun later. • The method will be retried up to 25 (default) times at increasingly longer intervals until it passes. • 108 hours at most Job.db_time_now + (job.attempts ** 4) + 5
  88. 88. Capistrano Recipes • Remember to restart delayed_job after deployment • Check out lib/delayed_job/recipes.rb after "deploy:stop", "delayed_job:stop" after "deploy:start", "delayed_job:start" after "deploy:restart", "delayed_job:restart"
  89. 89. Resque http://github.com/defunkt/resque • a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. • Github’s open source project • you can only place JSONable Ruby objects • includes a Sinatra app for monitoring what's going on • support multiple queues • you expect a lot of failure/chaos
  90. 90. My recommendations: • General purpose: delayed_job (Github highly recommend DelayedJob to anyone whose site is not 50% background work.) • Time-scheduled: cron + rake
  91. 91. 5. SOA for Rails • What’s SOA • Why SOA • Considerations • The tool set
  92. 92. What’s SOA Service oriented architectures • “monolithic” approach is not enough • SOA is a way to design complex applications by splitting out major components into individual services and communicating via APIs. • a service is a vertical slice of functionality: database, application code and caching layer
  93. 93. a monolithic web app example request Load Balancer WebApps Database
  94. 94. a SOA example request Load request Balancer WebApp WebApps for Administration for User Services A Services B Database Database
  95. 95. Why SOA? Isolation • Shared Resources • Encapsulation • Scalability • Interoperability • Reuse • Testability • Reduce Local Complexity
  96. 96. Shared Resources • Different front-web website use the same resource. • SOA help you avoiding duplication databases and code. • Why not only shared database? • code is not DRY WebApp for Administration WebApps for User • caching will be problematic Database
  97. 97. Encapsulation • you can change underly implementation in services without affect other parts of system • upgrade library • upgrade to Ruby 1.9 • you can provide API versioning
  98. 98. Scalability1: Partitioned Data Provides • Database is the first bottleneck, a single DB server can not scale. SOA help you reduce database load • Anti-pattern: only split the database WebApps • model relationship is broken • referential integrity Database A Database B • Myth: database replication can not help you speed and consistency
  99. 99. Scalability 2: Caching • SOA help you design caching system easier • Cache data at the right times and expire at the right times • Cache logical model, not physical • You do not need cache view everywhere
  100. 100. Scalability 3: Efficient • Different components have different task loading, SOA can scale by service. WebApps Load Balancer Load Balancer Services A Services A Services B Services B Services B Services B
  101. 101. Security • Different services can be inside different firewall • You can only open public web and services, others are inside firewall.
  102. 102. Interoperability • HTTP is the common interface, SOA help you integrate them: • Multiple languages • Internal system e.g. Full-text searching engine • Legacy database, system • External vendors
  103. 103. Reuse • Reuse across multiple applications • Reuse for public APIs • Example: Amazon Web Services (AWS)
  104. 104. Testability • Isolate problem • Mocking API calls • Reduce the time to run test suite
  105. 105. Reduce Local Complexity • Team modularity along the same module splits as your software • Understandability: The amount of code is minimized to a quantity understandable by a small team • Source code control
  106. 106. Considerations • Partition into Separate Services • API Design • Which Protocol
  107. 107. How to partition into Separate Services • Partitioning on Logical Function • Partitioning on Read/Write Frequencies • Partitioning by Minimizing Joins • Partitioning by Iteration Speed
  108. 108. API Design • Send Everything you need • Parallel HTTP requests • Send as Little as Possible • Use Logical Models
  109. 109. Physical Models & Logical Models • Physical models are mapped to database tables through ORM. (It’s 3NF) • Logical models are mapped to your business problem. (External API use it) • Logical models are mapped to physical models by you.
  110. 110. Logical Models • Not relational or normalized • Maintainability • can change with no change to data store • can stay the same while the data store changes • Better fit for REST interfaces • Better caching
  111. 111. Which Protocol? • SOAP • XML-RPC • REST
  112. 112. RESTful Web services • Rails way • REST is about resources • URL • Verbs: GET/PUT/POST/DELETE
  113. 113. The tool set • Web framework • XML Parser • JSON Parser • HTTP Client
  114. 114. Web framework • We do not need controller, view too much • Rails is a little more, how about Sinatra? • Rails metal
  115. 115. ActiveResource • Mapping RESTful resources as models in a Rails application. • But not useful in practice, why?
  116. 116. XML parser • http://nokogiri.org/ • Nokogiri ( ) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.
  117. 117. JSON Parser • http://github.com/brianmario/yajl-ruby/ • An extremely efficient streaming JSON parsing and encoding library. Ruby C bindings to Yajl
  118. 118. HTTP Client • http://github.com/pauldix/typhoeus/ • Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
  119. 119. Tips • Define your logical model (i.e. your service request result) first. • model.to_json and model.to_xml is easy to use, but not useful in practice.
  120. 120. 6.Distributed File System • NFS not scale • we can use rsync to duplicate • MogileFS • http://www.danga.com/mogilefs/ • http://seattlerb.rubyforge.org/mogilefs-client/ • Amazon S3 • HDFS (Hadoop Distributed File System) • GlusterFS
  121. 121. 7.Distributed Database • NoSQL • CAP theorem • Eventually consistent • HBase/Cassandra/Voldemort
  122. 122. The End
  123. 123. References • Books&Articles: • Distributed Programming with Ruby, Mark Bates (Addison Wesley) • Enterprise Rails, Dan Chak (O’Reilly) • Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley) • RESTful Web Services, Richardson&Ruby (O’Reilly) • RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly) • Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers) • Ruby in Practice, McAnally&Arkin (Manning) • Building Scalable Web Sites, Cal Henderson (O’Reilly) • Background Processing in Rails, Erik Andrejko (Rails Magazine) • Background Processing with Delayed_Job, James Harrison (Rails Magazine) • Bulinging Scalable Web Sites, Cal Henderson (O’Reilly) • Web 点 ( ) • Slides: • Background Processing (Rob Mack) Austin on Rails - April 2009 • The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH) • Asynchronous Processing (Jonathan Dahl) • Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008 • Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008 • Physical Models & Logical Models in Rails, dan chak
  124. 124. References • Links: • http://segment7.net/projects/ruby/drb/ • http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces • http://github.com/blog/542-introducing-resque • http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/ • http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/ • http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html • http://blog.gslin.org/archives/2009/07/25/2065/ • http://www.javaeye.com/topic/524977 • http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
  125. 125. Todo (maybe next time) • AMQP/RabbitMQ example code • How about Nanite? • XMPP • MagLev VM • More MapReduce example code • How about Amazon Elastic MapReduce? • Resque example code • More SOA example and code • MogileFS example code
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×