• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A Case of Accidental Concurrency
 

A Case of Accidental Concurrency

on

  • 1,404 views

Concurrency in Ruby is all the rage these days, and people can't seem to agree...

Concurrency in Ruby is all the rage these days, and people can't seem to agree
whether Threads, Fibers, event loops, or actors are the best solution. But did you ever consider that your *sequential* Ruby program might be concurrent, with nary a Thread, Fiber, or callback in sight? Well, it happened to me.

This is the story of how accidental concurrency (also known as re-entrancy) broke my brain multiple times over the course of two years, spawned flamewars on Twitter, long blog posts, and the various solutions I took to solve the problem. Along the way we'll illuminate some subtleties of concurrent programming in Ruby, differences between several Ruby implementations, and how we can all write code that is friendlier when accidental concurrency strikes.

Statistics

Views

Total Views
1,404
Views on SlideShare
1,157
Embed Views
247

Actions

Likes
0
Downloads
0
Comments
0

6 Embeds 247

http://sempe.me 110
http://sempe.local 77
http://lanyrd.com 57
http://us-w1.rockmelt.com 1
http://localhost 1
http://coderwall.com 1

Accessibility

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • My name is Sean Cribbs, you can find me on Twitter @seancribbs. Today I’m going to tell you a story about how accidental concurrency happened to me, and how you can prepare for it.\n
  • ...or alternatively called “ACCIDENTAL? FFFFFFFUUUUUUUUUUUU”\n\nFair warning, I’m including a bunch of memes in my presentation to lighten the mood. If you don’t get them, I’m sorry. Just nod and smile and humor the crazy man.\n\nBut first, a little bit more about my background and how I came to this problem.\n
  • I work for Basho Technologies, a company originally based in Cambridge, Massachusetts, USA, but now geographically distributed. We make Riak, the awesome distributed, fault-tolerant database. Riak is used at Kiip (where Mitchell works) as well as at other startups like Yammer, Voxer, Bump, and Github, but also Fortune 500 companies like Comcast (an American telco/cable provider), America Online (YES AOL) and Citigroup. Of course, there are many others we don’t even know about because Riak is free and open-source.\n
  • At Basho we like to think of ourselves as experts on distributed systems. A number of our founding engineers built Akamai in the late 90’s, the content distribution network -- a sort of cloud before we called them that. Most of our stuff is in Erlang, where we get a lot of concurrent problems solved for us or with minimal effort.\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • But this talk is about concurrency in Ruby, not Erlang. What kinds of tools do we have for concurrency in Ruby? We’re all pretty familiar with doing concurrency with multiple processes, where we simply start new instances of our program -- we’ve been doing that for a long time. We’re also familiar with threads, which until 1.9 weren’t really as useful as they could have been. To solve some of the problems with threads before 1.9, people created reactor libraries like EventMachine, mostly to handle IO-heavy work (not CPU-heavy), but they tend to require a different programming model, with callbacks and such. More recently, we got Fibers which let you have multiple streams of work that cooperate more explicitly, and Actor libraries like Celluloid that make your objects feel like isolated processes. But there’s another type of “concurrency” that can happen to sequential Ruby code! If you’re like I was when I first ran into this, you’re probably saying “WAT?!?!”\n
  • This case of accidental concurrency happened when I was implementing several streaming requests/responses that Riak supports for the Ruby client. Let’s look at how streaming operations work.\n
  • To perform a streaming operation on Riak, the client sends the request, and then the server sends partial results to the client in multiple packets. The client waits around, accepting the partial results until the server signals that it’s done.\n\nStreaming operations are great for large responses because both the client and the server can do things more efficiently -- the client can proactively process intermediate results from the server, and the server can reduce the amount of data it has to buffer by sending portions of the result immediately. Riak supports two such operations, list-keys and MapReduce. In HTTP, this is implemented as a chunked response, whereas on the Protocol Buffers-based binary protocol, it simply sends multiple message frames, followed by a final “done” frame.\n\nIf you look at the sequence of events, it looks a lot like an iteration over the response chunks. So that’s how I implemented the interface, as if you were calling #each on an Enumerable.\n
  • To perform a streaming operation on Riak, the client sends the request, and then the server sends partial results to the client in multiple packets. The client waits around, accepting the partial results until the server signals that it’s done.\n\nStreaming operations are great for large responses because both the client and the server can do things more efficiently -- the client can proactively process intermediate results from the server, and the server can reduce the amount of data it has to buffer by sending portions of the result immediately. Riak supports two such operations, list-keys and MapReduce. In HTTP, this is implemented as a chunked response, whereas on the Protocol Buffers-based binary protocol, it simply sends multiple message frames, followed by a final “done” frame.\n\nIf you look at the sequence of events, it looks a lot like an iteration over the response chunks. So that’s how I implemented the interface, as if you were calling #each on an Enumerable.\n
  • To perform a streaming operation on Riak, the client sends the request, and then the server sends partial results to the client in multiple packets. The client waits around, accepting the partial results until the server signals that it’s done.\n\nStreaming operations are great for large responses because both the client and the server can do things more efficiently -- the client can proactively process intermediate results from the server, and the server can reduce the amount of data it has to buffer by sending portions of the result immediately. Riak supports two such operations, list-keys and MapReduce. In HTTP, this is implemented as a chunked response, whereas on the Protocol Buffers-based binary protocol, it simply sends multiple message frames, followed by a final “done” frame.\n\nIf you look at the sequence of events, it looks a lot like an iteration over the response chunks. So that’s how I implemented the interface, as if you were calling #each on an Enumerable.\n
  • To perform a streaming operation on Riak, the client sends the request, and then the server sends partial results to the client in multiple packets. The client waits around, accepting the partial results until the server signals that it’s done.\n\nStreaming operations are great for large responses because both the client and the server can do things more efficiently -- the client can proactively process intermediate results from the server, and the server can reduce the amount of data it has to buffer by sending portions of the result immediately. Riak supports two such operations, list-keys and MapReduce. In HTTP, this is implemented as a chunked response, whereas on the Protocol Buffers-based binary protocol, it simply sends multiple message frames, followed by a final “done” frame.\n\nIf you look at the sequence of events, it looks a lot like an iteration over the response chunks. So that’s how I implemented the interface, as if you were calling #each on an Enumerable.\n
  • To perform a streaming operation on Riak, the client sends the request, and then the server sends partial results to the client in multiple packets. The client waits around, accepting the partial results until the server signals that it’s done.\n\nStreaming operations are great for large responses because both the client and the server can do things more efficiently -- the client can proactively process intermediate results from the server, and the server can reduce the amount of data it has to buffer by sending portions of the result immediately. Riak supports two such operations, list-keys and MapReduce. In HTTP, this is implemented as a chunked response, whereas on the Protocol Buffers-based binary protocol, it simply sends multiple message frames, followed by a final “done” frame.\n\nIf you look at the sequence of events, it looks a lot like an iteration over the response chunks. So that’s how I implemented the interface, as if you were calling #each on an Enumerable.\n
  • Here’s the basic format of a streaming operation in the Riak Ruby client. You call the streaming operation, passing a block that will receive intermediate results and do something with them.\n\nNow initially, I was using the “curb” library for HTTP, which provides direct bindings to libcurl. You can give a Ruby block to curb that will be called for each chunk of the response. This made a natural transition to streaming the results through the user’s code block, with a little wrapping. Here’s roughly what that code looked like.\n
  • Here’s the basic format of a streaming operation in the Riak Ruby client. You call the streaming operation, passing a block that will receive intermediate results and do something with them.\n\nNow initially, I was using the “curb” library for HTTP, which provides direct bindings to libcurl. You can give a Ruby block to curb that will be called for each chunk of the response. This made a natural transition to streaming the results through the user’s code block, with a little wrapping. Here’s roughly what that code looked like.\n
  • That’s awesome, let’s STREAM ALL THE THINGS!\n
  • ... which works fine until -- for example, when listing keys -- you want to make ADDITIONAL requests to the server from within the streaming block. You get an exception! What happened?\n
  • As it turns out, that “curl” thing was an instance variable containing a Curl::Easy connection handle, initialized something like this. The idea was that when you made multiple requests in a row, you could keep the same connection around and avoid setup/teardown costs. Well, it didn’t work in the streaming case.\n\nSo I thought to make it a Thread-local variable instead, like so.\nNOPE! Still broken! Why was it being corrupted?\n
  • Here’s what the libcurl docs say about the problem. I read this and thought, WAT? I’m not using any threads!\n
  • I didn’t really understand the problem until I realized that libcurl yields to the block BEFORE the request finishes and returns. That means, that through the lexically-bound scope of the block, it has access to the ORIGINAL connection from which it initiated the first request and so it tries to reuse the handle CONCURRENTLY (that is, recursively) with processing the original request. Woops!\n
  • This informative (but ugly) sequence diagram shows you what I just mentioned, the key parts being in purple and red and the huge leftward line before those parts. If you really want to read it later, you can find it on my blog, which I’ll give a link to later.\n\nBut there’s another term for this problem of “accidental concurrency”: the code is not re-entrant.\n
  • Re-entrance is a classical CS problem, defined here by Wikipedia: \n
  • Furthermore: Essentially, I was attempting to reuse the “curl” instance variable or Thread local WHILE it was already in-use and unavailable. I was unintentionally allowing recursion via the passed block and not protecting that curl handle from improper accesses. It was effectively a global variable. I was disappoint.\n\nSo now what do I do? I wanted the solution to be simple to implement and maintain, but also look synchronous to the user.\n
  • I had two goals for the solution, which were to:\n\n- keep the call looking like a synchronous iteration, with no extra work done by caller about concurrency\n- Make solution as simple as possible\n
  • The first solution I came up with -- this was way back in 2010 -- was to use Fibers to isolate the local connection state. I don’t remember where I found this solution, but it might have been StackOverflow or Dave Thomas’ blog posts about the subject.\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • Here’s what the solution looks like. It’s a little hard to follow so I’ll walk you through it. For now we’ll only focus on the streaming case since the sequential case is still the same as before.\n\n1) First we get a local copy of the currently unsullied curl handle.\n2) Then we open a new fiber - remember fibers are cooperative and must be explicitly started/stopped. This block won’t execute until we call resume later down.\n3) Inside the block, we make a local copy of the current fiber so we can refer to it in the callback.\n4) Now we apply the same on_body callback as before, but instead of yielding to the passed block, we resume the fiber. You’ll see why in a second.\n5) Then we start an infinite loop that immediately yields control of the fiber (halts waiting for a message chunk). When it gets that chunk, it passes it to the original block via yield, then starts all over again.\n6) Finally, we start the fiber using “resume” so that the infinite loop can begin.\n\nThat’s great, but why does this work?\n
  • I used “Poor Man’s Fibers” @tmm1 for 1.8 & JRuby - which is essentially a thread with two queues. The passed streaming block runs inside the context of the created Fiber.\n\nMatz could correct me on this, but I think the thread-locals for Fibers is simply an artifact of its design and not an explicit feature. Rubinius, for example, doesn’t have them last I checked.\n\nAnyway, as a result, the Thread-local version of the handle is isolated when streaming and things just work out peachy.\n
  • With this, I thought: THIS IS AWESOME, FIBERS NAILED IT. For a long time, this remained the solution. Although it met the first goal of appearing synchronous, it’s not the simplest to understand or maintain.\n
  • Almost a year later, I was wanting to add new HTTP libraries behind the Riak client so that people could use it more effectively. I wanted the same benefits that Curb gave in terms of long-lived connections, but there were some quirks with its implementation that made it work poorly on certain types of operations.\n\nIncidentally, the new HTTP library I picked ALSO used a Thread-local variable for keep-alive connections. I had the same problem all over again, so this time I wanted to generalize the solution.\n
  • The idea was to extract the fibery-isolation bits from the HTTP-library specific bits. What I came up with looks strikingly like the event loop in a Reactor library like EventMachine so I called it the Pump. This is the entire implementation.\n\nNotice how we create and start the Fiber just as we did before, using yield and resume with the infinite loop.\n\nThe “pump” method is the inside of the “on_body” callback block from before. It receives a chunk and then resumes the fiber with that chunk.\n\nFinally, the to_proc method makes it easier to treat a Pump object like a block, by binding the pump method and returning it as a proc. \n
  • That means our previous ugly Curb-related code looks like this instead! Wow that’s a lot easier to follow. It also paved the way for me to do similar things on Riak’s binary protocol, which I was just beginning to implement and where I had to manage sockets myself.\n\nThis solution pretty much met both goals - hiding concurrency away from the user and also having a simple implementation. However, it doesn’t appear that simple or clear unless you understand where it came from.\n
  • At this point I was proud of my achievement and started to talk about it on Twitter. I drew a lot of confusion and ire from Evan Phoenix and Yehuda Katz, respectively. Unfortunately, thanks to the expiration policy for tweets I was unable to exhume the conversation before this talk. In the course of trying to explain myself, I ended up writing a blog post which is the source of most of this talk. You can find it at the second link above, or go to seancribbs.com/tech and open the archive for March 2011. \n\nEvan didn’t understand what I was trying to solve -- “what is this code good for?” he said -- and, as is often the case, Yehuda was right. The real solution to the concurrency/re-entrancy problem, he explained, is to implement a proper connection pool that protects connections in use and creates new ones as needed.\n
  • On the other hand, my coworker Andy is also right. Making a connection pool is hard, and understanding all the edge cases of concurrent access, locks and condition variables is difficult, so I procrastinated on the issue.\n
  • Because I procrastinated, I was able to pawn the work off onto one of my awesome committers, Kyle Kingsbury. \n
  • This is Kyle, he works at Boundary. He is awesome. Give him a high-five on Twitter @aphyr.\n\nHe and I discussed the code a lot, but most of the work in the current version is his. In addition to creating a proper connection pool, he also wrote a bunch of neat logic to load-balance connections across Riak hosts in the cluster, with error-sensitive host selection and retries. But let’s look at how only the pool bit of it works.\n
  • We’ll start with the smallest part, which is the individual items in the pool, we call elements. They are essentially a wrapper around the thing we want to allocate, that is, connections. To claim or lock an element, the “owner” instance variable is set to the current thread; to unlock, it is set to nil. There are also query methods to find out if the element is locked, which simply test whether the owner is nil, but I’ve omitted those here.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Now let’s look at the core of what the pool does, which is to find and allocate elements to users of the pool and automatically release them when they are finished. We called the method that handles this task “take” and it has a fun alias to the “right-shift” operator which makes your calls look sort of like Haskell’s monadic “bind” operator.\n\nAgain, I’ve simplified the code here so it’s easier to understand. First we set up a result variable to receive the return value of the passed block, which we will return at the end of the method. We also set up an element variable to receive a connection that we grab from the pool. \n\nInside the begin/rescue/end section we obtain a lock on the elements in the pool. Once we have the lock, we find an element that is not locked. If there are none available, we create a new one and add it to the pool. Finally, we lock the element so that no other sections of code -- even in the same Thread -- will try to use it until we’re done. Then we exit the critical section. Notice how we’ve kept the critical section quite small - this is important if you want your code to move forward reliably. Only lock when necessary! Now we actually get to the real work - we yield the connection to the block, which can do whatever it needs to service the request.\n\nThe rescue clause here lets us raise exceptions when we find that the connection is bad (disconnected or otherwise) and remove that element from the pool, then re-raise the exception so that the caller can do something with it, like decrement a retry count.\n\nFinally, we ensure that the element is unlocked and ready for the next user before returning the result. Note that we don’t care about grabbing the lock here because we’re already done with the connection, not claiming it, and we’re not modifying the pool membership.\n
  • Here’s an example of how you might use the pool to perform a request. Notice how we streamed the list-keys operation through the block, taking another connection to fetch the values pointed to by the keys we have received.\n\nBut we don’t need to expose the pool to the top-level interface, that’s an implementation detail.\n\n\n
  • Now we’ve achieved the same level of abstraction as we originally tried to get, but we are thread-safe and our connections are protected from corruption.\n
  • There are some things I left out of the presented code which deserve mentioning:\n\n* The pool creator specifies “open” and “close” variables that are used to allocate and deallocate elements in the pool.\n* When you call take, you can also provide a filter which will let you choose whether a given element is valid. This lets us do the error-sensitive host selection, so we retry requests on a different host.\n* You can iterate over the pool elements in a thread-safe way to do things like modify all of them, or clear the pool. This uses a separate lock for iteration and a condition variable to detect when elements are released. You can also just get the size of the pool, which is useful for metrics.\n* In order to make this solution actually work, we had to monkey-patch/duck-punch the HTTP libraries so they don’t handle the keep-alive bits. \n* There’s also some anecdotal evidence that the pool methodology gives some performance improvement, but I chalk this up to the fact that it is actually thread-safe, allowing you to take advantage of IO concurrency, as well as spreads the load via the multi-host feature.\n
  • So how can we, as a community that develops awesome Ruby libraries, protect against accidental concurrency?\n
  • First, we need to be careful about leaky scope. Throwing an exception when something is corrupted is not sufficient, we need to make sure precious and sensitive resources aren’t leaked outside their intended scope. Both thread-locals and ivars can leak if we’re not careful.\n\nIf your code yields to user code blocks, make sure those things you created in the code path that yields are unreachable from the block unless explicitly passed. If you do leak scope, protect your valuables with things like thread-safe resource pools.\n
  • Second, we can be more explicit about the types of behaviors our code exhibit around cached resources. Does it only work in certain circumstances? Document that. Don’t just assume everyone is going to want ONE connection per host and port.\n\nIn order to fix the problems I had with one HTTP library, I had to do some deep monkey-patch on a private method, which both feels wrong and potentially breaks any other code that uses the library. Instead, the library could either make it transparent AND SAFE to the user, or allow the user to inject the desired caching behavior (which might be none at all).\n
  • Thanks for listening, I’d be happy to take questions now.\n

A Case of Accidental Concurrency A Case of Accidental Concurrency Presentation Transcript

  • A Case of AccidentalConcurrency Sean Cribbs @seancribbs
  • I work for Basho We make
  • Distributed Systems Experts
  • Ruby Concurrency
  • Processes Ruby Concurrency
  • Processes Threads Ruby Concurrency
  • Processes Reactors Threads Ruby Concurrency
  • Processes Reactors Threads Ruby Concurrency Fibers
  • Processes Reactors Threads Ruby Concurrency Actors Fibers
  • Processes Reactors Threads Ruby Concurrency Actors Fibers Sequential Code
  • Riak Streaming Operations http://thecatchandthehatch.com/pages/river-resources/
  • Streaming Ops client serverstream_me
  • Streaming Ops client serverstream_me result result result done
  • Streaming Ops client serverstream_me result result result done list-keys & MapReduce
  • Stream in Ruby# Request a streamed operationclient.stream_something do |result| process(result)end
  • Stream in Ruby# Request a streamed operationclient.stream_something do |result| process(result)end# Stream via curbif block_given? curl.on_body {|c| yield c; c.size }else curl.on_body # Clear out the callbackend
  • Curl::Err::MultiBadEasyHandle:Invalid easy handle
  • Curling Backdef curl @curl ||= Curl::Easy.newend
  • Curling Backdef curl @curl ||= Curl::Easy.newenddef curl Thread.current[:curl] ||= Curl::Easy.newend
  • "The first basic rule is that you mustnever simultaneously share a libcurlhandle (be it easy or multi or whatever)between multiple threads." -- libcurl docs
  • NO THREADS you must"The first basic rule is thatnever simultaneously share a libcurlhandle (be it easy or multi or whatever)between multiple threads." -- libcurl docs
  • Realization• curl yields to block BEFORE return• block tries to reuse handle while connection is processing
  • Re-entrant"In computing, a computer program orsubroutine is called reentrant if it canbe interrupted in the middle of itsexecution and then safely called again("re-entered") before its previousinvocations complete execution." Wikipedia
  • Re-entrant"...a subroutine can fail to be reentrantif it relies on a global variable to remainunchanged but that variable is modifiedwhen the subroutine is recursivelyinvoked." Wikipedia
  • Re-entrant"...a subroutine can fail to be reentrantif it relies on a global variable to remainunchanged but that variable is modifiedwhen the subroutine is recursivelyinvoked." Wikipedia
  • Solution #1: Fibers
  • Solution #1: Fibersif block_given? _curl = curl Fiber.new { f = Fiber.current _curl.on_body do |chunk| f.resume(chunk); chunk.size end loop do yield Fiber.yield end }.resumeelse # ...
  • Solution #1: Fibersif block_given? _curl = curl local curl Fiber.new { f = Fiber.current _curl.on_body do |chunk| f.resume(chunk); chunk.size end loop do yield Fiber.yield end }.resumeelse # ...
  • Solution #1: Fibersif block_given? _curl = curl Fiber.new { open fiber f = Fiber.current _curl.on_body do |chunk| f.resume(chunk); chunk.size end loop do yield Fiber.yield end }.resumeelse # ...
  • Solution #1: Fibersif block_given? _curl = curl Fiber.new { f = Fiber.current this fiber _curl.on_body do |chunk| f.resume(chunk); chunk.size end loop do yield Fiber.yield end }.resumeelse # ...
  • Solution #1: Fibersif block_given? _curl = curl Fiber.new { f = Fiber.current _curl.on_body do |chunk| f.resume(chunk); chunk.size resume on chunk end loop do yield Fiber.yield end }.resumeelse # ...
  • Solution #1: Fibersif block_given? _curl = curl Fiber.new { f = Fiber.current _curl.on_body do |chunk| f.resume(chunk); chunk.size end loop do wait for chunk, yield Fiber.yield yield chunk, end repeat }.resumeelse # ...
  • Solution #1: Fibersif block_given? _curl = curl Fiber.new { f = Fiber.current _curl.on_body do |chunk| f.resume(chunk); chunk.size end loop do yield Fiber.yield end start the fiber/loop }.resumeelse # ...
  • How it Works• Stream block runs INSIDE the Fiber•Fibers have Thread locals• Thread.current[:curl] is isolated
  • Solution #2: Pump
  • class Pump  def initialize(block)    @fiber = Fiber.new do      loop do        block.call Fiber.yield      end    end    @fiber.resume  end  def pump(input)    @fiber.resume input    input.size if input.respond_to?(:size)  end  def to_proc
  • Simplerif block_given? curl.on_body(&Pump.new(block))else curl.on_bodyend
  • COMMENCE TWITTERSTORM http://seancribbs.com/tech/2011/03/08/how-ripple-uses- fibers Photos by James Duncan Davidson, O’Reilly RailsConf 2010http://www.flickr.com/photos/oreillyconf/4686730445/ http://www.flickr.com/photos/oreillyconf/4683464653/
  • “There are three hardthings in ComputerScience: naming, cacheinvalidation, andconnection pooling.” Andy Gross, Basho Architect Jan 2012
  • Solution #3:Connection Pool
  • @aphyr
  • class Pool::Element attr_accessor :object, :owner def initialize(object) @object = object @owner = nil end def lock @owner = Thread.current end def unlock @owner = nil endend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil result var begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil pool element @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do grab pool lock element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end find a connection element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element all claimed! end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end add new element.lock end conn to result = yield element.object rescue BadResource pool delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock claim it end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object call block rescue BadResource delete_element(element) and raise ensure element.unlock if element end resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element cleanup end baddies resultend
  • def take(opts = {}) # alias >> take result = nil begin element = nil @lock.synchronize do element = pool.find {|e| e.unlocked? } unless element resource = opts[:default] || @open.call element = Element.new(resource) @elements << element end element.lock end result = yield element.object rescue BadResource delete_element(element) and raise ensure element.unlock if element release end resultend
  • # If you actually do this request, youll get a# warning. So dont.objects = []pool.take do |conn| conn.list_keys("ruby") do |keys| keys.each do |k| pool.>> {|c| objects << c.get("ruby", k) } end endend
  • # If you actually do this request, youll get a# warning. So dont.objects = []client.list_keys("ruby") do |keys| keys.each do |k| objects << client.get("ruby", k) endend
  • Becoming Less “Accidental”
  • Fix Leaky Scope
  • Be Explicit
  • Thanks!@seancribbs