Synchronous Reads Asynchronous Writes RubyConf 2009

25,017 views

Published on

Published in: Technology
1 Comment
24 Likes
Statistics
Notes
No Downloads
Views
Total views
25,017
On SlideShare
0
From Embeds
0
Number of Embeds
11,398
Actions
Shares
0
Downloads
112
Comments
1
Likes
24
Embeds 0
No embeds

No notes for slide
  • Synchronous Reads Asynchronous Writes RubyConf 2009

    1. Synchronous Reads, Asynchronous Writes note to make sure these aren’t showing up on the Paul Dix slides
    2. Where I work at Know More, a prelaunch search startup
    3. hack sweet, sweet code
    4. what does
    5. Synchronous Reads
    6. Asynchronous Writes
    7. mean?
    8. Data Reads Through Well it means creating systems that perform data reads through services. data reads typically have to be synchronous because a user is waiting on Services the operation. So they have to occur inside the request/response life-cycle.
    9. data writes through a messaging or queuing system Often, a user doesn’t have to wait for data to be written to receive a response. So writes can be done asynchronously outside of the request/ response life-cycle which mean you can put them straight into a queue
    10. Loosely Coupled
    11. Now the Why? question is why in the hell you’d want to do this.
    12. Rails doesn’t Scale
    13. Rails doesn’t Scale Your Database doesn’t Scale
    14. Monolithic Applications Also, having your entire application in one code base and system doesn’t scale. This leads to test suites that take more than 30 minutes to Don’t Scale run, deploys that push your entire application just for a simple update.
    15. if you have
    16. Lots of Traffic This could be on the front end from users or on the back end from data processing
    17. Multiple multiple applications that have to read from the same data Applications store or share business logic
    18. Multiple Background Processes Multiple back end processes that need to run based on changes in data. or if you need data replicated and munged
    19. Complex Business complex business logic that may have to span multiple systems. Logic
    20. if you have one or more of those situations...
    21. Services based Approach
    22. Java developers who work in... No Talent Hacks
    23. the enterprise commonly refer to this as....
    24. Service Oriented Architecture service oriented architecture, but that’s commonly associated with things like SOAP, WSDL, and a bunch of other heinous things.
    25. Scary!
    26. the tools
    27. Synchronous Reads
    28. is an approach based on RESTful Services restful ser vices
    29. which means Descriptive URLs things like descriptive URLs
    30. Taking advantage of HTTP Verbs
    31. GET
    32. PUT
    33. POST
    34. DELETE
    35. and for that I’d recommend Sinatra, a web framework built on top of Rack. Really, I’d call it a ser vices Sinatra framework.
    36. serialization format
    37. For your message format you should use JSON. I know you can use XML, but... JSON
    38. XML is too bloated and XML Makes Children Cry complex and besides, it makes children cry
    39. I previously glossed over this picture. It’s something called an asynchronous electric motor, which is the only image I could conjure Asynchronous up to go with the term “asynchronous” Writes
    40. requires a Messaging System messaging system to write data through
    41. For that I suggest RabbitMQ , which is a RabbitMQ powerful messaging system in addition to having stuff as mundane as a queue
    42. And you’ll of course need a data Data Store store. I don’t care which you use, but it should probably be designed to solve the problem for a particular piece of your application.
    43. now let’s get into specifics
    44. but first,
    45. a word of warning...
    46. This isn’t about new applications.
    47. This isn’t about green field projects.
    48. It’s about solving existing problems.
    49. ruby programmers tend to jump on new Look, shiny! things because, hey look, shiny!
    50. Don’t Go Overboard, Don’t Over-think
    51. Joel Spolsky calls people that exhibit this behavior “Architecture Astronauts” “Sometimes smart thinkers just don't know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don't actually mean anything at all. These are the people I call Architecture Astronauts. It's very hard to get them to write code or design programs, because they won't stop thinking about Architecture.”
    52. remember, your goal is first to Build Something build something.
    53. so...
    54. Don’t be a space man
    55. with that out of the way
    56. let’s see what this looks like
    57. Standard Rails Application well, here’s your standard rails application. so you have rails and your trusty database
    58. and then you add some background processing... and then you realize that you can’t do everything inside the request/response cycle so you add in a background process. For now we’ll assume you’re using a database backed queue like dj, bj, or some other kind of “j”.
    59. and then you add memcache... but wait, then you realize you need additional performance so you add memcache
    60. server, duh! and let’s not forget that it’s all fronted by nginx or apache
    61. and then you need more app processing power so you add t wo more ser vers and front all that by ha proxy
    62. so once you’ve done all that, where do Where to from here? you go?
    63. maybe you add redis because you heard Ezra or Chris or somebody say it’s awesome and scales to infinity
    64. and then you add a read database to eek out a little more performance
    65. and the whole time your Rails application code base is growing with more logic and additional background processing
    66. it’s enough to make a grown makes you cry man cry
    67. Monolithic Applications Do Not Scale this is why monolithic applications do not scale. to make simple changes ...
    68. to this mess, you end up running the test suite and redeploying the whole thing.
    69. what else can you do?
    70. instead, you can break into multiple applications
    71. applications, called “services”
    72. real world example to go any farther into the architecture it’ll help to look at a specific real world example
    73. Let’s take something from my work
    74. millions of RSS and Atom feeds Since we’re pre-launch we definitely don’t have the too many users problem. The traffic and complexity comes from having to update millions of rss and atom feeds
    75. data from external sources Pulling in real time engagement from multiple external sources
    76. complex business logic and complex business logic. every time something enters our system we have to perform many different tasks that are interdependent. Here’s just a taste of it: our feed fetcher pulls in a new blog post from somewhere
    77. store the raw content
    78. scrape a summary
    79. check for duplicates
    80. language identification
    81. named entity extraction
    82. classify the content as spam, adult, etc.
    83. index the content for search
    84. run some crazy voodoo machine learning magic
    85. store it in Hadoop for analysis later
    86. run in parallel now some of these processes can be run in parallel
    87. run serially
    88. dependent on previous outputs
    89. different libraries and languages
    90. originally we set up a ser vices based design that looked kind of like this. as you can see there are a bunch of interconnections and it’s hard to comprehend. troubleshooting failures was hard.
    91. Each ser vice had to implement HTTP + JSON an http interface with json formatted messages. This was the only method for ser vice- to-ser vice communication.
    92. Two Problems
    93. engagement and post traffic is bursty
    94. queues behind every to manage the peaks in traffic everyone put queues behind each of their ser vices. service
    95. Data owners had to Data owners had to notify other ser vices when an update occured. notify everyone ser vices were tightly coupled.
    96. Tightly Coupled
    97. make and tightly coupled ser vices make otters cry otters cry
    98. thus, the idea was born
    99. keep the HTTP http services for data reads, which can be cached and Services for data reads optimized
    100. push writes through a messaging system data writes through a messaging system with built in routing. It also helps if it’s optimized for processing thousands of messages per second and supports the pubsub style
    101. Synchronous Reads
    102. Sinatra by Blake Mizerany
    103. require 'rubygems' require 'sinatra' get '/entries/:id' do Entry.find(params[:id]).to_json end now sinatra is awesome because it makes creating a service this easy.
    104. call services
    105. do it in parallel do it in parallel
    106. Amazon - 100 services
    107. Google - 1000 servers
    108. multi-threaded and asynchronous parallelism
    109. Typhoeus
    110. hydra = Typhoeus::Hydra.new first_request = Typhoeus::Request.new( "http://localhost:3000/posts/1.json") second_request = Typhoeus::Request.new( "http://localhost:3000/posts/2.json") hydra.queue(first_request) hydra.queue(second_request) hydra.run
    111. response = first_request.response response.code response.body response.time response.headers
    112. first_request.on_complete do |response| post = Post.new(JSON.parse(response.body)) # get the first url in the post third_request = Typhoeus::Request.new(post.links.first) third_request.on_complete do |response| # do something with that end hydra.queue third_request post end
    113. Start Finish 50 MS 40 MS 55 MS 25 MS 30 MS
    114. response.handled_response
    115. 20.times do r = Typhoeus::Request.new( "http://localhost:3000/users/1") hydra.queue r end hydra.run
    116. hydra.cache_setter do |request| @cache.set( request.cache_key, request.response, request.cache_timeout) if request.cache_timeout end hydra.cache_getter do |request| @cache.get(request.cache_key) end
    117. response = Response.new( :code => 200, :headers => "", :body => "{'name' : 'paul'}", :time => 0.3) hydra.stub(:get, "http://localhost:3000/users/1" ).and_return(response)
    118. request = Typhoeus::Request.new( "http://localhost:3000/users/1") request.on_complete do |response| JSON.parse(response.body) end hydra.queue request hydra.run
    119. hydra.stub(:get, /http://localhost:3000/users/.*/ ).and_return(response)
    120. package as gems
    121. versioning
    122. run multiple versions in parallel
    123. Asynchronous Writes
    124. RabbitMQ
    125. what about Beanstalk, Resque, Kestrel, or whatever? so why use RabbitMQ instead of beanstalk, resque, kestrel or any other option?
    126. Pubsub Semantics
    127. Flexible message routing
    128. these features enable you to build an event based system, which is Event Based System exactly what we needed. when certain updates happen, it should kick off calculations elsewhere in the system. I’ll get into that in a bit, but first some rabbit specifics
    129. rabbit is an implementation of an open protocol called Advanced Message Queueing Protocol or AMQP AMQP
    130. it’s not just a queue
    131. it has Exchanges and it has a bunch of features, but for the purposes of Asynchronous Writes, Routing Keys too exchanges and routing keys are what we care about most
    132. Rabbit has three exchange types. Exchange Types
    133. Direct
    134. Fanout
    135. Topic
    136. Message Router An exchange basically acts as a message router. Messages get published to it and it routes the messages to the appropriate queues.
    137. Example: Processing New Feed Entries
    138. So we have a fanout exchange called entry.write. every queue bound to this exchange will get messages published to it. Here we have the three things we want to do. First, index it for searching. Second, store it in our key valuer store. Third, index in a completely separate index used for data research. So the search is Solr/lucene and the research is Hadoop. Completely decoupled systems.
    139. That’s how we write entries. Here’s how we do event based processing on those writes. so here’s an example where we have a topic exchange named ‘entry.notify’. queues can be bound to exchanges. so we have these three queues
    140. so take the example where you have a message published to the exchange with a routing key of ‘insert’.
    141. the message would get routed to the queue bound with insert and to the queue bound with hash
    142. now let’s look at a message with a routing key of ‘update.clicks.rank’
    143. based on the bindings, the message gets dropped into the update and hash queue (ones on the right err left?)
    144. error logging
    145. routing key: domU-12-31-39-07.feed_fetcher
    146. binding: *.feed_fetcher
    147. binding: #
    148. RabbitMQ client libraries
    149. AMQP by Aman Gupta
    150. Bunny by Chris Duncan
    151. client = Bunny.new(:host => "mysweetrabbbitserver.pauldix.net") client.start
    152. exchange = client.exchange( "exceptions", :type => :topic, :durable => true) exchange.publish( "oh noes, an exception!", :key => "domU-12-31-39-07.feed_fetcher")
    153. queue = Bunny::Queue.new( client, "exceptions.logger") queue.bind("exceptions", :key => "#") queue.subscribe do |msg| log.error(msg[:payload]) end
    154. async write considerations
    155. uniqueness value uniqueness is hard to enforce.
    156. http://localhost:3000/locks/names/ pauldix one way is to have the ser vice responsible expose a uniqueness getter. so once you GET a lock, you write through the queue.
    157. no transactions
    158. eventual consistency
    159. Eric Brewer’s CAP theorem in brewer’s CAP theorem he talked about the relationship bet ween three requirements when building distributed systems. consistency, availability, and partition tolerance.
    160. consistency consistency means that an operation either works completely or fails. this is also referred to as atomic
    161. availability availability is pretty self explanatory. a service is available to ser ve requests. so you can shoot for high availability
    162. partition tolerance when you replicate data across multiple systems, you create the possibility of forming a partition. this happens when one or more systems lose connectivity to other systems. partition tolerance is defined formally as “no set of failures less than total net work failure is allowed to cause the system to respond incorrectly”
    163. pick two
    164. Werner Vogels’ eventual consistency “is a special form of weak consistency. if no new updates are made to an object, eventually all accesses will return the last updated value.”
    165. Synchronous Reads
    166. Asynchronous Writes
    167. trade-offs
    168. strong consistency
    169. iteration speed
    170. scalability
    171. loose coupling
    172. single purpose services
    173. Services and Ruby can be friends possible for ser vices and ruby to be friends
    174. finally, a little Advertising advertising
    175. http://pauldix.net My web site is pauldix.net
    176. http://github/pauldix my github is pauldix
    177. my t witter is @pauldix @pauldix
    178. I’m also writing a book for Addison Wesley. It’s called Service Oriented Design with Ruby and Rails.

    ×