Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
Getting Distributed (With Ruby On Rails)
Implementing distributed processing at Working With Rails
8228 views | comments | 24 favorites | 0 downloads | 0 embeds (Stats)
More Info
This slideshow is Public
Total Views: 8228 on Slideshare: 8228 from embeds: 0
Slideshow Transcript
- Slide 1: Getting Distributed
With Ruby (On Rails)
by Martin Sadler
Implementing distributed processing at Working With Rails
- Slide 2: dsc.net
- Slide 3: DSC
• Hosting, Web application development, and
consultancy
• Host the crew email system and carried out
the intranet integration for Virgin Atlantic.
• Runs several large forums.
• e.g. pprune.org - 150k members, 2600 at one
time
• Also AskDirect, Mumsnet
- Slide 4: http://www.workingwithrails.com
- Slide 5: Working With Rails
• Largest index of Ruby on Rails in the world
• Over 7000 people listed
• From 104 countries
• Find out who’s who?
• Connect with others
• Find a developer for a project / employment
• Also lists groups, companies, and sites
- Slide 6: Why Distributed?
- Slide 7: Working With Rails
- Slide 9: Distributed Ruby?
- Slide 10: Some background
- Slide 11: Some background
- Slide 12: Current
• Uses FeedTools (nice lib!)
• Great at parsing feed formats
• Good for small sites
- Slide 13: Issues
- Slide 14: Issues
• No longer supported by author
- Slide 15: Issues
• No longer supported by author
• Feeds fetched at the users expense (and
therefore Mongrels)
- Slide 16: Issues
• No longer supported by author
• Feeds fetched at the users expense (and
therefore Mongrels)
• Feeds are cached locally but are parsed on
request
- Slide 17: Issues
• No longer supported by author
• Feeds fetched at the users expense (and
therefore Mongrels)
• Feeds are cached locally but are parsed on
request
• Known probs when scaling (search on
Google)
- Slide 18: The Result
- Slide 19: The Result
• Occasional slow loading pages that include
third party feeds
- Slide 20: The Result
• Occasional slow loading pages that include
third party feeds
• Stale feed items
- Slide 21: The Result
• Occasional slow loading pages that include
third party feeds
• Stale feed items
• Inconstant feed items
- Slide 22: The Result
• Occasional slow loading pages that include
third party feeds
• Stale feed items
• Inconstant feed items
• No good!
- Slide 23: The Challenge
• To keep content fresh, push traffic to WWR
and out to the blog owners
• Different feeds and sources to consider:
Flickr, Twitter, Blog, Delicious
• Each need to display in multiple places in
many ways
• But also want to do some funkier stuff (as
you’ll see a bit later)
- Slide 24: Ruby on Rails distributed processing choices
RingyDingy
AP4R
Rinda DRB Starfish
BackgroundRB Reliable-Message
- Slide 25: DRB
Basic building block of all other Ruby
distributed libs.
“DRb literally stands for \"Distributed Ruby\". It is a library that allows you
to send and receive messages from remote Ruby objects via TCP/IP. Sound
kind of like RPC, CORBA or Java's RMI? Probably so. This is Ruby's simple
as dirt answer to all of the above.”
http://chadfowler.com/ruby/drb.html
- Slide 26: Quick DRB Example
Server
Client
require 'drb'
require 'drb'
class TestServer DRb.start_service()
obj = DRbObject.new(nil, 'druby://localhost:9000')
def doit # Now use obj
\"Hello, Distributed World\" p obj.doit
end
end
aServerObject = TestServer.new
DRb.start_service('druby://localhost:9000', aServerObject)
DRb.thread.join # Don't exit just yet!
- Slide 27: Quick DRB Example
Server
Client
require 'drb'
require 'drb'
class TestServer DRb.start_service()
obj = DRbObject.new(nil, 'druby://localhost:9000')
def doit # Now use obj
\"Hello, Distributed World\" p obj.doit
end
end
aServerObject = TestServer.new
DRb.start_service('druby://localhost:9000', aServerObject)
DRb.thread.join # Don't exit just yet!
> ruby server.rb
- Slide 28: Quick DRB Example
Server
Client
require 'drb'
require 'drb'
class TestServer DRb.start_service()
obj = DRbObject.new(nil, 'druby://localhost:9000')
def doit # Now use obj
\"Hello, Distributed World\" p obj.doit
end
end
aServerObject = TestServer.new
DRb.start_service('druby://localhost:9000', aServerObject)
DRb.thread.join # Don't exit just yet!
> ruby server.rb > ruby client.rb
“Hello Distributed World”
- Slide 29: Basics
• Server
• Clients / Workers
• Communicate via messages
http://en.wikipedia.org/wiki/Distributed_computing
- Slide 30: BackgroundRB
• Ruby job server and scheduler.
• Integrates with Rails
• Quite complex
• Some issues between versions but many
favor it above the other libs
• Most well known
http://backgroundrb.rubyforge.org/
- Slide 31: Starfish
• Inspired by Google’s MapReduce
• Easy to understand code
• Stability?
• No longer supported by author?
http://rufy.com/starfish/doc/
- Slide 32: reliable-message
• Solid library
• Easy to understand API
• Bit more involved to setup
• Can be integrated with Rails
• On going development
http://trac.labnotes.org/cgi-bin/trac.cgi/wiki/Ruby/ReliableMessaging
- Slide 33: AP4R
• Asynchronous Processing for Ruby
• Lesser known lib from Japan (new kid on the
block)
• Integrates with Rails
• Built on top of reliable-message
- Slide 34: AP4R
• AP4R, Asynchronous Processing for Ruby, is
the implementation of reliable asynchronous
message processing. It provides message
queuing, and message dispatching.
• Using asynchronous processing, we can cut
down turn-around-time of web applications
by queuing, or can utilize more machine
power by load-balancing.
- Slide 35: AP4R Features
• Business logic can be implemented as simple Web applications, or ruby code, whether it's called
asynchronously or synchronously.
• Asynchronous messaging is reliable by RDBMS persistence (now MySQL only) or file
persistence, under the favor of reliable-msg.
• Load balancing over multiple AP4R processes on single/multiple servers is supported.
• Asynchronous logics are called via various protocols, such as XML-RPC, SOAP, HTTP PUT, and
more.
• Using store and forward function, at-least-omce QoS level is provided.
- Slide 36: AP4R Process Flow
• A client(e.g. a web browser) makes a request to a web server (Apache, Lighttpd, etc...).
• A rails application (a synchronous logic) is executed on mongrel via mod_proxy or something.
• At the last of the synchronous logic, message(s) are put to AP4R (AP4R provides a helper).
• Once the synchronous logic is done, the clients receives a response immediately.
• AP4R queues the message, and requests it to the web server asynchronously.
• An asynchronous logic, implemented as usual rails action, is executed.
- Slide 37: AP4R example
Hello World app comes with AP4R to get
you started.
Nice guide also here
http://rubyforge.org/frs/download.php/13312/AP4R_Users_Guide_EN.pdf
- Slide 38: Complimentary
Services
- Slide 39: Rinda
• Rinda::Ring allows DRb services and clients
to automatically find each other without
knowing where they live.
• DRb servers register themselves with a
RingServer which allows clients to find the
servers they need. Many servers may
register themselves with the RingServer. The
DRb servers don't need to run on the same
machine.
http://segment7.net/projects/ruby/drb/rinda/ringserver.html
- Slide 40: RingyDingy
• RingyDingy automatically registers a service
with a RingServer. If communication between
the RingServer and the RingyDingy is lost,
RingyDingy will re-register its service with
the RingServer when it reappears.
http://seattlerb.rubyforge.org/RingyDingy/
- Slide 41: Feeds in WWR
AP4R
Server
Feed
@queue
Queue
Feed Fetcher Feed Fetcher Feed Fetcher
1 2 N
- Slide 42: Running the code
ruby script/ap4r_start -c config/queues_mysql.cfg
rake background:feed_queue
rake background:feed_retrieve
- Slide 43: Key points
• The Feed Queue fetches the urls of stale
feeds
• Each worker (client) has the Rails
environment loaded
- Slide 44: With this solution
• Can scale as demand grows
• Flexible for any type of feed data
• Still - room for improvement
- Slide 45: Possible Improvements
• Automatic spawning and killing of workers
as queue size grows or decreases
• Better handling of feed errors
• Dynamic polling intervals based on user
defined prefs or some intelligent logic.
- Slide 46: When to go distributed?
• Long running process or task
• Fetching external data
• Complex computations
• .... that can be broken into chunks or work
• You care about the user experience
- Slide 47: Pitfalls
• Dependencies
• `connection closed' errors on Mac (IPV6) -
change all refs of localhost to 127.0.0.1 to
avoid. (had to patch reliable-message)
• Terminology to understand
• Memory requirements
- Slide 48: Do you need
distributed?
• Maybe you would be better scheduling
instead?
• http://www.igvita.com/blog/2007/03/29/
scheduling-tasks-in-ruby-rails/
- Slide 49: So where is all this
leading us to?
- Slide 50: Contextual Feed
Aggregation
- Slide 52: Group feed aggregation
- Slide 54: Group blog posts
- Slide 55: Group blog posts
Twitters
- Slide 56: Group blog posts
Twitters
and so on.......
- Slide 57: Thanks!
http://www.dsc.net
http://www.workingwithrails.com
Blog: http://beyondthetype.com
Enjoyed the talk? Recommend me on WWR
http://workingwithrails.com/person/5152-martin-sadler