Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Synchronous Reads Asynchronous Writes RubyConf 2009

26,762 views

Published on

Published in: Technology

Synchronous Reads Asynchronous Writes RubyConf 2009

  1. Synchronous Reads, Asynchronous Writes note to make sure these aren’t showing up on the Paul Dix slides
  2. Where I work at Know More, a prelaunch search startup
  3. hack sweet, sweet code
  4. what does
  5. Synchronous Reads
  6. Asynchronous Writes
  7. mean?
  8. Data Reads Through Well it means creating systems that perform data reads through services. data reads typically have to be synchronous because a user is waiting on Services the operation. So they have to occur inside the request/response life-cycle.
  9. data writes through a messaging or queuing system Often, a user doesn’t have to wait for data to be written to receive a response. So writes can be done asynchronously outside of the request/ response life-cycle which mean you can put them straight into a queue
  10. Loosely Coupled
  11. Now the Why? question is why in the hell you’d want to do this.
  12. Rails doesn’t Scale
  13. Rails doesn’t Scale Your Database doesn’t Scale
  14. Monolithic Applications Also, having your entire application in one code base and system doesn’t scale. This leads to test suites that take more than 30 minutes to Don’t Scale run, deploys that push your entire application just for a simple update.
  15. if you have
  16. Lots of Traffic This could be on the front end from users or on the back end from data processing
  17. Multiple multiple applications that have to read from the same data Applications store or share business logic
  18. Multiple Background Processes Multiple back end processes that need to run based on changes in data. or if you need data replicated and munged
  19. Complex Business complex business logic that may have to span multiple systems. Logic
  20. if you have one or more of those situations...
  21. Services based Approach
  22. Java developers who work in... No Talent Hacks
  23. the enterprise commonly refer to this as....
  24. Service Oriented Architecture service oriented architecture, but that’s commonly associated with things like SOAP, WSDL, and a bunch of other heinous things.
  25. Scary!
  26. the tools
  27. Synchronous Reads
  28. is an approach based on RESTful Services restful ser vices
  29. which means Descriptive URLs things like descriptive URLs
  30. Taking advantage of HTTP Verbs
  31. GET
  32. PUT
  33. POST
  34. DELETE
  35. and for that I’d recommend Sinatra, a web framework built on top of Rack. Really, I’d call it a ser vices Sinatra framework.
  36. serialization format
  37. For your message format you should use JSON. I know you can use XML, but... JSON
  38. XML is too bloated and XML Makes Children Cry complex and besides, it makes children cry
  39. I previously glossed over this picture. It’s something called an asynchronous electric motor, which is the only image I could conjure Asynchronous up to go with the term “asynchronous” Writes
  40. requires a Messaging System messaging system to write data through
  41. For that I suggest RabbitMQ , which is a RabbitMQ powerful messaging system in addition to having stuff as mundane as a queue
  42. And you’ll of course need a data Data Store store. I don’t care which you use, but it should probably be designed to solve the problem for a particular piece of your application.
  43. now let’s get into specifics
  44. but first,
  45. a word of warning...
  46. This isn’t about new applications.
  47. This isn’t about green field projects.
  48. It’s about solving existing problems.
  49. ruby programmers tend to jump on new Look, shiny! things because, hey look, shiny!
  50. Don’t Go Overboard, Don’t Over-think
  51. Joel Spolsky calls people that exhibit this behavior “Architecture Astronauts” “Sometimes smart thinkers just don't know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don't actually mean anything at all. These are the people I call Architecture Astronauts. It's very hard to get them to write code or design programs, because they won't stop thinking about Architecture.”
  52. remember, your goal is first to Build Something build something.
  53. so...
  54. Don’t be a space man
  55. with that out of the way
  56. let’s see what this looks like
  57. Standard Rails Application well, here’s your standard rails application. so you have rails and your trusty database
  58. and then you add some background processing... and then you realize that you can’t do everything inside the request/response cycle so you add in a background process. For now we’ll assume you’re using a database backed queue like dj, bj, or some other kind of “j”.
  59. and then you add memcache... but wait, then you realize you need additional performance so you add memcache
  60. server, duh! and let’s not forget that it’s all fronted by nginx or apache
  61. and then you need more app processing power so you add t wo more ser vers and front all that by ha proxy
  62. so once you’ve done all that, where do Where to from here? you go?
  63. maybe you add redis because you heard Ezra or Chris or somebody say it’s awesome and scales to infinity
  64. and then you add a read database to eek out a little more performance
  65. and the whole time your Rails application code base is growing with more logic and additional background processing
  66. it’s enough to make a grown makes you cry man cry
  67. Monolithic Applications Do Not Scale this is why monolithic applications do not scale. to make simple changes ...
  68. to this mess, you end up running the test suite and redeploying the whole thing.
  69. what else can you do?
  70. instead, you can break into multiple applications
  71. applications, called “services”
  72. real world example to go any farther into the architecture it’ll help to look at a specific real world example
  73. Let’s take something from my work
  74. millions of RSS and Atom feeds Since we’re pre-launch we definitely don’t have the too many users problem. The traffic and complexity comes from having to update millions of rss and atom feeds
  75. data from external sources Pulling in real time engagement from multiple external sources
  76. complex business logic and complex business logic. every time something enters our system we have to perform many different tasks that are interdependent. Here’s just a taste of it: our feed fetcher pulls in a new blog post from somewhere
  77. store the raw content
  78. scrape a summary
  79. check for duplicates
  80. language identification
  81. named entity extraction
  82. classify the content as spam, adult, etc.
  83. index the content for search
  84. run some crazy voodoo machine learning magic
  85. store it in Hadoop for analysis later
  86. run in parallel now some of these processes can be run in parallel
  87. run serially
  88. dependent on previous outputs
  89. different libraries and languages
  90. originally we set up a ser vices based design that looked kind of like this. as you can see there are a bunch of interconnections and it’s hard to comprehend. troubleshooting failures was hard.
  91. Each ser vice had to implement HTTP + JSON an http interface with json formatted messages. This was the only method for ser vice- to-ser vice communication.
  92. Two Problems
  93. engagement and post traffic is bursty
  94. queues behind every to manage the peaks in traffic everyone put queues behind each of their ser vices. service
  95. Data owners had to Data owners had to notify other ser vices when an update occured. notify everyone ser vices were tightly coupled.
  96. Tightly Coupled
  97. make and tightly coupled ser vices make otters cry otters cry
  98. thus, the idea was born
  99. keep the HTTP http services for data reads, which can be cached and Services for data reads optimized
  100. push writes through a messaging system data writes through a messaging system with built in routing. It also helps if it’s optimized for processing thousands of messages per second and supports the pubsub style
  101. Synchronous Reads
  102. Sinatra by Blake Mizerany
  103. require 'rubygems' require 'sinatra' get '/entries/:id' do Entry.find(params[:id]).to_json end now sinatra is awesome because it makes creating a service this easy.
  104. call services
  105. do it in parallel do it in parallel
  106. Amazon - 100 services
  107. Google - 1000 servers
  108. multi-threaded and asynchronous parallelism
  109. Typhoeus
  110. hydra = Typhoeus::Hydra.new first_request = Typhoeus::Request.new( "http://localhost:3000/posts/1.json") second_request = Typhoeus::Request.new( "http://localhost:3000/posts/2.json") hydra.queue(first_request) hydra.queue(second_request) hydra.run
  111. response = first_request.response response.code response.body response.time response.headers
  112. first_request.on_complete do |response| post = Post.new(JSON.parse(response.body)) # get the first url in the post third_request = Typhoeus::Request.new(post.links.first) third_request.on_complete do |response| # do something with that end hydra.queue third_request post end
  113. Start Finish 50 MS 40 MS 55 MS 25 MS 30 MS
  114. response.handled_response
  115. 20.times do r = Typhoeus::Request.new( "http://localhost:3000/users/1") hydra.queue r end hydra.run
  116. hydra.cache_setter do |request| @cache.set( request.cache_key, request.response, request.cache_timeout) if request.cache_timeout end hydra.cache_getter do |request| @cache.get(request.cache_key) end
  117. response = Response.new( :code => 200, :headers => "", :body => "{'name' : 'paul'}", :time => 0.3) hydra.stub(:get, "http://localhost:3000/users/1" ).and_return(response)
  118. request = Typhoeus::Request.new( "http://localhost:3000/users/1") request.on_complete do |response| JSON.parse(response.body) end hydra.queue request hydra.run
  119. hydra.stub(:get, /http://localhost:3000/users/.*/ ).and_return(response)
  120. package as gems
  121. versioning
  122. run multiple versions in parallel
  123. Asynchronous Writes
  124. RabbitMQ
  125. what about Beanstalk, Resque, Kestrel, or whatever? so why use RabbitMQ instead of beanstalk, resque, kestrel or any other option?
  126. Pubsub Semantics
  127. Flexible message routing
  128. these features enable you to build an event based system, which is Event Based System exactly what we needed. when certain updates happen, it should kick off calculations elsewhere in the system. I’ll get into that in a bit, but first some rabbit specifics
  129. rabbit is an implementation of an open protocol called Advanced Message Queueing Protocol or AMQP AMQP
  130. it’s not just a queue
  131. it has Exchanges and it has a bunch of features, but for the purposes of Asynchronous Writes, Routing Keys too exchanges and routing keys are what we care about most
  132. Rabbit has three exchange types. Exchange Types
  133. Direct
  134. Fanout
  135. Topic
  136. Message Router An exchange basically acts as a message router. Messages get published to it and it routes the messages to the appropriate queues.
  137. Example: Processing New Feed Entries
  138. So we have a fanout exchange called entry.write. every queue bound to this exchange will get messages published to it. Here we have the three things we want to do. First, index it for searching. Second, store it in our key valuer store. Third, index in a completely separate index used for data research. So the search is Solr/lucene and the research is Hadoop. Completely decoupled systems.
  139. That’s how we write entries. Here’s how we do event based processing on those writes. so here’s an example where we have a topic exchange named ‘entry.notify’. queues can be bound to exchanges. so we have these three queues
  140. so take the example where you have a message published to the exchange with a routing key of ‘insert’.
  141. the message would get routed to the queue bound with insert and to the queue bound with hash
  142. now let’s look at a message with a routing key of ‘update.clicks.rank’
  143. based on the bindings, the message gets dropped into the update and hash queue (ones on the right err left?)
  144. error logging
  145. routing key: domU-12-31-39-07.feed_fetcher
  146. binding: *.feed_fetcher
  147. binding: #
  148. RabbitMQ client libraries
  149. AMQP by Aman Gupta
  150. Bunny by Chris Duncan
  151. client = Bunny.new(:host => "mysweetrabbbitserver.pauldix.net") client.start
  152. exchange = client.exchange( "exceptions", :type => :topic, :durable => true) exchange.publish( "oh noes, an exception!", :key => "domU-12-31-39-07.feed_fetcher")
  153. queue = Bunny::Queue.new( client, "exceptions.logger") queue.bind("exceptions", :key => "#") queue.subscribe do |msg| log.error(msg[:payload]) end
  154. async write considerations
  155. uniqueness value uniqueness is hard to enforce.
  156. http://localhost:3000/locks/names/ pauldix one way is to have the ser vice responsible expose a uniqueness getter. so once you GET a lock, you write through the queue.
  157. no transactions
  158. eventual consistency
  159. Eric Brewer’s CAP theorem in brewer’s CAP theorem he talked about the relationship bet ween three requirements when building distributed systems. consistency, availability, and partition tolerance.
  160. consistency consistency means that an operation either works completely or fails. this is also referred to as atomic
  161. availability availability is pretty self explanatory. a service is available to ser ve requests. so you can shoot for high availability
  162. partition tolerance when you replicate data across multiple systems, you create the possibility of forming a partition. this happens when one or more systems lose connectivity to other systems. partition tolerance is defined formally as “no set of failures less than total net work failure is allowed to cause the system to respond incorrectly”
  163. pick two
  164. Werner Vogels’ eventual consistency “is a special form of weak consistency. if no new updates are made to an object, eventually all accesses will return the last updated value.”
  165. Synchronous Reads
  166. Asynchronous Writes
  167. trade-offs
  168. strong consistency
  169. iteration speed
  170. scalability
  171. loose coupling
  172. single purpose services
  173. Services and Ruby can be friends possible for ser vices and ruby to be friends
  174. finally, a little Advertising advertising
  175. http://pauldix.net My web site is pauldix.net
  176. http://github/pauldix my github is pauldix
  177. my t witter is @pauldix @pauldix
  178. I’m also writing a book for Addison Wesley. It’s called Service Oriented Design with Ruby and Rails.

×