Synchronous Reads Asynchronous Writes RubyConf 2009

Synchronous Reads,
Asynchronous Writes
note to make sure these
aren’t showing up on the
Paul Dix
slides

Where I work at
Know More, a
prelaunch search
startup

Data Reads Through
Well it means creating systems that
perform data reads through services.
data reads typically have to be
synchronous because a user is waiting on

Services
the operation. So they have to occur
inside the request/response life-cycle.

data writes through a
messaging or queuing
system
Often, a user doesn’t have to wait for
data to be written to receive a
response. So writes can be done
asynchronously outside of the request/
response life-cycle which mean you can
put them straight into a queue

Now the Why?
question is why
in the hell you’d
want to do this.

Rails doesn’t Scale

Your Database
doesn’t Scale

Monolithic Applications
Also, having your entire application in one code
base and system doesn’t scale. This leads to
test suites that take more than 30 minutes to

Don’t Scale
run, deploys that push your entire application
just for a simple update.

Lots of Trafﬁc

This could be on the
front end from users or
on the back end from
data processing

Multiple
multiple applications
that have to read
from the same data

Applications
store or share business
logic

Multiple
Background
Processes
Multiple back end
processes that need to run
based on changes in data.
or if you need data
replicated and munged

Complex Business
complex business logic that may
have to span multiple systems.

Logic

if you have one or
more of those
situations...

Java developers who work in...

No Talent Hacks

the enterprise
commonly refer
to this as....

Service Oriented Architecture
service oriented
architecture, but that’s
commonly associated with
things like SOAP, WSDL, and
a bunch of other heinous
things.

is an approach based on

RESTful Services
restful ser vices

which means

Descriptive URLs
things like
descriptive URLs

Taking advantage of
HTTP Verbs

and for that I’d recommend
Sinatra, a web framework
built on top of Rack. Really,
I’d call it a ser vices
Sinatra
framework.

For your message format you should use
JSON. I know you can use XML, but...

JSON

XML is too
bloated and
XML Makes Children Cry
complex and
besides, it makes
children cry

I previously glossed over this picture. It’s
something called an asynchronous electric
motor, which is the only image I could conjure
Asynchronous
up to go with the term “asynchronous”
Writes

requires a

Messaging System
messaging
system to write
data through

For that I suggest
RabbitMQ , which is a

RabbitMQ
powerful messaging
system in addition to
having stuff as mundane
as a queue

And you’ll of course need a data

Data Store
store. I don’t care which you
use, but it should probably be
designed to solve the problem
for a particular piece of your
application.

now let’s get into
speciﬁcs

This isn’t about new
applications.

This isn’t about green
ﬁeld projects.

It’s about solving
existing problems.

ruby programmers
tend to jump on new
Look, shiny!
things because, hey
look, shiny!

Don’t Go Overboard,
Don’t Over-think

Joel Spolsky calls people
that exhibit this behavior
“Architecture Astronauts”
“Sometimes smart thinkers just don't know when to stop, and they create
these absurd, all-encompassing, high-level pictures of the universe that
are all good and ﬁne, but don't actually mean anything at all.

These are the people I call Architecture Astronauts. It's very hard to get
them to write code or design programs, because they won't stop thinking
about Architecture.”

remember, your
goal is ﬁrst to Build Something
build something.

with that out of the
way

let’s see what this looks
like

Standard Rails
Application
well, here’s your standard
rails application. so you have
rails and your trusty
database

and then you add
some background
processing...

and then you realize that you can’t do
everything inside the request/response
cycle so you add in a background process.
For now we’ll assume you’re using a
database backed queue like dj, bj, or some
other kind of “j”.

and then
you add
memcache...
but wait, then you realize
you need additional
performance so you add
memcache

server, duh!

and let’s not
forget that it’s
all fronted by
nginx or apache

and then you need more app
processing power so you add
t wo more ser vers and front
all that by ha proxy

so once you’ve done
all that, where do
Where to from here?
you go?

maybe you add redis because
you heard Ezra or Chris or
somebody say it’s awesome
and scales to inﬁnity

and then you add
a read database
to eek out a
little more
performance

and the whole time your Rails
application code base is
growing with more logic and
additional background
processing

it’s enough to
make a grown
makes you cry
man cry

Monolithic Applications
Do Not Scale
this is why monolithic
applications do not scale. to
make simple changes ...

to this mess, you
end up running
the test suite
and redeploying
the whole thing.

instead, you can break
into multiple
applications

applications, called
“services”

real world example
to go any farther into the
architecture it’ll help to
look at a speciﬁc real world
example

Let’s take something from my
work

millions of RSS and
Atom feeds
Since we’re pre-launch we
deﬁnitely don’t have the too
many users problem. The trafﬁc
and complexity comes from
having to update millions of rss
and atom feeds

data from external
sources
Pulling in real time
engagement from
multiple external
sources

complex business logic
and complex business logic. every
time something enters our system
we have to perform many different
tasks that are interdependent.
Here’s just a taste of it: our feed
fetcher pulls in a new blog post from
somewhere

classify the content as
spam, adult, etc.

index the content for
search

run some crazy voodoo
machine learning magic

store it in Hadoop for
analysis later

run in parallel
now some of
these processes
can be run in
parallel

different libraries and
languages

originally we set up
a ser vices based
design that looked
kind of like this. as
you can see there
are a bunch of
interconnections
and it’s hard to
comprehend.
troubleshooting
failures was hard.

Each ser vice had to implement

HTTP + JSON
an http interface with json
formatted messages. This was
the only method for ser vice-
to-ser vice communication.

engagement and post
trafﬁc is bursty

queues behind every
to manage the peaks
in trafﬁc everyone put
queues behind each of
their ser vices.

service

Data owners had to
Data owners had to
notify other ser vices
when an update occured.

notify everyone
ser vices were tightly
coupled.

make
and tightly
coupled ser vices
make otters cry

otters
cry

keep the HTTP
http services
for data reads,
which can be
cached and

Services for data reads
optimized

push writes through a
messaging system
data writes through a messaging
system with built in routing. It
also helps if it’s optimized for
processing thousands of messages
per second and supports the
pubsub style

require 'rubygems'
require 'sinatra'

get '/entries/:id' do
Entry.find(params[:id]).to_json
end
now sinatra is awesome
because it makes creating a
service this easy.

do it in parallel do it in parallel

multi-threaded and
asynchronous
parallelism

hydra = Typhoeus::Hydra.new

first_request = Typhoeus::Request.new(
"http://localhost:3000/posts/1.json")
second_request = Typhoeus::Request.new(
"http://localhost:3000/posts/2.json")
hydra.queue(first_request)
hydra.queue(second_request)
hydra.run

response = first_request.response
response.code
response.body
response.time
response.headers

first_request.on_complete do |response|
post = Post.new(JSON.parse(response.body))
# get the first url in the post
third_request =
Typhoeus::Request.new(post.links.first)
third_request.on_complete do |response|
# do something with that
end
hydra.queue third_request

post
end

Start Finish
50 MS

40 MS

55 MS

25 MS

30 MS

20.times do
r = Typhoeus::Request.new(
"http://localhost:3000/users/1")
hydra.queue r
end
hydra.run

hydra.cache_setter do |request|
@cache.set(
request.cache_key,
request.response,
request.cache_timeout) if
request.cache_timeout
end

hydra.cache_getter do |request|
@cache.get(request.cache_key)
end

response = Response.new(
:code => 200,
:headers => "",
:body => "{'name' : 'paul'}",
:time => 0.3)
hydra.stub(:get,
"http://localhost:3000/users/1"
).and_return(response)

request = Typhoeus::Request.new(
"http://localhost:3000/users/1")
request.on_complete do |response|
JSON.parse(response.body)
end
hydra.queue request
hydra.run

hydra.stub(:get,
/http://localhost:3000/users/.*/
).and_return(response)

run multiple versions in
parallel

what about Beanstalk,
Resque, Kestrel, or
whatever?
so why use RabbitMQ
instead of beanstalk,
resque, kestrel or any
other option?

these features enable you to build
an event based system, which is

Event Based System
exactly what we needed. when
certain updates happen, it should
kick off calculations elsewhere in
the system. I’ll get into that in a bit,
but ﬁrst some rabbit speciﬁcs

rabbit is an implementation
of an open protocol called
Advanced Message
Queueing Protocol or AMQP
AMQP

it has Exchanges and
it has a bunch of
features, but for the
purposes of
Asynchronous Writes,

Routing Keys too
exchanges and routing
keys are what we care
about most

Rabbit has three
exchange types.

Exchange Types

Message Router
An exchange basically acts as a message
router. Messages get published to it and
it routes the messages to the
appropriate queues.

Example: Processing
New Feed Entries

So we have a fanout exchange called
entry.write. every queue bound to this
exchange will get messages published to it.
Here we have the three things we want to do.
First, index it for searching. Second, store it in
our key valuer store. Third, index in a
completely separate index used for data
research. So the search is Solr/lucene and the
research is Hadoop. Completely decoupled
systems.

That’s how we write entries. Here’s
how we do event based processing on
those writes. so here’s an example
where we have a topic exchange
named ‘entry.notify’. queues can be
bound to exchanges. so we have these
three queues

so take the example where
you have a message published
to the exchange with a
routing key of ‘insert’.

the message would get
routed to the queue
bound with insert and
to the queue bound with
hash

now let’s look at a message
with a routing key of
‘update.clicks.rank’

based on the bindings, the
message gets dropped into the
update and hash queue (ones
on the right err left?)

routing key:
domU-12-31-39-07.feed_fetcher

client = Bunny.new(:host =>
"mysweetrabbbitserver.pauldix.net")
client.start

exchange = client.exchange(
"exceptions",
:type => :topic,
:durable => true)

exchange.publish(
"oh noes, an exception!",
:key => "domU-12-31-39-07.feed_fetcher")

queue = Bunny::Queue.new(
client, "exceptions.logger")
queue.bind("exceptions", :key => "#")

queue.subscribe do |msg|
log.error(msg[:payload])
end

uniqueness

value uniqueness is hard to enforce.

http://localhost:3000/locks/names/
pauldix
one way is to have the
ser vice responsible
expose a uniqueness
getter. so once you GET
a lock, you write
through the queue.

Eric Brewer’s CAP
theorem
in brewer’s CAP theorem he talked
about the relationship bet ween three
requirements when building
distributed systems. consistency,
availability, and partition tolerance.

consistency
consistency means that an
operation either works
completely or fails. this is also
referred to as atomic

availability
availability is pretty self
explanatory. a service is
available to ser ve requests.
so you can shoot for high
availability

partition tolerance
when you replicate data across multiple
systems, you create the possibility of forming
a partition. this happens when one or more
systems lose connectivity to other systems.
partition tolerance is deﬁned formally as “no
set of failures less than total net work failure
is allowed to cause the system to respond
incorrectly”

Werner Vogels’
eventual consistency
“is a special form of weak
consistency. if no new updates are
made to an object, eventually all
accesses will return the last
updated value.”

Services and Ruby can
be friends

possible for
ser vices and
ruby to be
friends

ﬁnally, a little

Advertising
advertising

http://pauldix.net
My web site is
pauldix.net

http://github/pauldix
my github is
pauldix

my t witter is
@pauldix
@pauldix

I’m also writing a book
for Addison Wesley. It’s
called Service Oriented
Design with Ruby and
Rails.

Synchronous Reads Asynchronous Writes RubyConf 2009

Recommended

Recommended

More Related Content

Similar to Synchronous Reads Asynchronous Writes RubyConf 2009

Similar to Synchronous Reads Asynchronous Writes RubyConf 2009 (20)

More from pauldix

More from pauldix (7)

Recently uploaded

Recently uploaded (20)

Synchronous Reads Asynchronous Writes RubyConf 2009