I&#x2019;m Sarah Mei, and I&#x2019;m a Ruby developer at Pivotal Labs here in San Francisco. We&#x2019;re an agile consulting company, and we do a lot of Rails. I&#x2019;m here to talk to you today about Ruby APIs for NoSQL. I also like to call this talk....
Polyglot Persistence. We&#x2019;re going to talk about how you store data in Ruby when you&#x2019;re writing a system that uses more than just a relational database. And I&#x2019;d like to start with a little bit of audience participation... so get ready....
Show of hands: who&#x2019;s written an application in Ruby that uses a relational database to store data?
Who&#x2019;s written a Ruby app that uses a relational database AND some kind of alternative datastore?
Who&#x2019;s written a Ruby app that uses ONLY non-relational datastores?
So I&#x2019;m going to start off by showing you this diagram that&#x2019;s been in hundreds of &#x201C;teach yourself Rails in 25 seconds&#x201D; type blog posts. Here we have a vanilla Rails application - requests come in, they go through routes to a controller, which cedes to the models, which use ActiveRecord to access MySQL. And then we go back the way we came, except on the way out we go through views instead of routes, and return a response.
I was showing this diagram to a friend, and he said, &#x201C;you know, if you have a diagram like this, you are legally required to have a little drawing of a cloud to represent the internet.&#x201D; So...
There you go. Anyway, I would be surprised if anyone in this room has written a real application with real users in which these are the only boxes that appear on the system diagram, and in particular, in which MySQL is the only way data is persisted. Certainly you don&#x2019;t set out to create a poly-persistant system - it just happens. It starts innocently enough ...
You add solr for free-text search. Pretty soon though you have more than one app server, so you start...
...uploading your assets to s3. Pretty soon you realize you need to do some...
...background jobs, so you install a little resque in there. Then since you&#x2019;re a social network you realize you need to cache your friend lists, so you toss those in redis too...
And pretty soon you decide you need an object cache, so you stick in cache-money...and let me just pause to say that I&#x2019;m drawing all these arrows into and out of the models. That&#x2019;s what you should do. I know some of you put this logic in your controllers, but I&#x2019;m ignoring you. Let&#x2019;s pretend we put it all in the right place.
So, at this point, you have five different datastores, and you haven&#x2019;t even done anything particularly crazy. In fact I would say that this type of setup is a lot more common in real applications than my original undecorated diagram of what a Rails app looks like. This is really what a basic Rails app looks like these days.
It&#x2019;s the new normal. It&#x2019;s pretty amazing! In the last year, we&#x2019;ve come to recognize that most applications have models with data that doesn&#x2019;t fit relational persistence very well.
These models have relationships, which are hard to model in SQL - those joins get nasty pretty quick. They have blocks of text that need to be searched semantically. They have frequently-read, rarely-written objects that need to be retrieved very very quickly. And they have associated assets that go along with them.
And I would like to point out, before I go on, that until recently, we didn&#x2019;t even think of all of these things as alternative datastores. Cassandra and redis, certainly, but s3? solr? memcached? Are all of these things datastores?
Of course you know what I think. One of the really interesting things about the NoSQL movement is that they&#x2019;ve expanded our definition of a datastore. Does anyone remember those guys in the 90s that were writing big Java apps and persisting stuff in XML text files? Was anyone here one of those guys?
Those guys were doing NoSQL...before it was coool.
But big problem #1 with a system like this is....
How do you encapsulate all of these disparate data stores into a model class that makes sense? What does it look like?
So that&#x2019;s question 1, keep that in mind.
There&#x2019;s another question, too...
How do you go from this...
...to this? How do you replace your primary datastore with something else? All of these data stores we&#x2019;ve been looking at so far have been satellites to the primary which was SQL. But if you&#x2019;re lucky, and very successful, you&#x2019;ll need to replace it with something else.
Digg, for example, just moved over to Cassandra! For everything! Of course this is not always the right answer, because shortly after Digg rolled out their new architecture, they fired their VP of engineering, ... so...
But if you want to do it, and sometimes you do, historically, this has been pretty difficult to do in Rails.
That&#x2019;s question 2 - how do you replace your primary datastore?
In reality, rather than this kind of thing, where all you have is Cassandra and no SQL-based storage at all, ...
Most apps use a hybrid approach. They use standard SQL persistance for some things, because there IS data that&#x2019;s suited for, such as basic CRUD stuff.
Then they use non-SQL persistence for other things, where it fits. I mean, you can keep cassandra for your voluminous sparse data...if you have that, and a surprising number of apps do!
So here are the two interesting questions we have when we&#x2019;re building a system that uses multiple datastores:
So just to sum up this section, this idea that multiple datastores are a fact of life if you&#x2019;re building even simple applications is called polyglot persistence.
It&#x2019;s not a new idea - Ben Scofield has been talking about it for more than a year. But a lot of Ruby developers think of &#x201C;the data storage layer&#x201D; and they think it&#x2019;s a choice between MySQL and Postgres, when in reality, it&#x2019;s going to end up with an assortment of different technologies. Whether or not they want that, and whether or not they explicitly plan for it.
So let&#x2019;s take a look at what that&#x2019;s going to look like.
Let&#x2019;s start with the base case - I decided for my sample application I would build a cephalopod social network, mostly because I like the word cephalopod. So here we&#x2019;re starting out with a basic Squid class. We&#x2019;re inheriting from ActiveRecord::Base so we get the relational introspection and stuff for free.
Then our product owner comes along and tells us that customers want to free-text search the set of squids. So, we install sunspot, and add solr free-text search by adding a &#x201C;searchable&#x201D; block describing which attributes get indexed in solr.
And then of course since it&#x2019;s a web application, we have to have a friend graph. So we decide to store a denormalized list of each squid&#x2019;s friends in redis. This probably wouldn&#x2019;t be the first solution I settled on, but pretend with me that I&#x2019;ve already tried storing this in MySQL and encountered the pain of horrendous joins. I&#x2019;ve realized that this type of relationship data just doesn&#x2019;t fit into a row-centered relational datastore very well, so I&#x2019;m trying something else.
So I add a &#x201C;follow&#x201D; method that uses the astonishingly-descriptively-named &#x201C;redis&#x201D; gem to insert stuff into a redis keyspace.
And then, of course, since this is a social network centered around cephalopods and the novels they write, I want to allow them to upload a finished manuscript to s3. So I install the, again, very usefully-named s3 gem, and add an upload_novel method that puts the tex file into s3.
Current datastore count: 4
Now at this point, the Squid class is started to look a little...
You&#x2019;re descending from ActiveRecord::Base, but you&#x2019;ve got this searchable method from solr, plus you&#x2019;ve got this follow method from redis, plus an upload method that uses s3, plus the magic hidden ActiveRecord methods. Each of these things persists a little piece of the whole squid.
You know what I would like? Some kind of consistent interface.
Now you may laugh, but I originally thought it would be awesome to be able to do everything through ActiveRecord. It&#x2019;s very tempting! Most of us who use Rails think of ActiveRecord as synonymous with &#x201C;model.&#x201D;
It&#x2019;d be awesome to have everything go through one interface. Active Record - it&#x2019;s my model of all my data stores! My code lives happily on top!
But, sadly, it just doesn&#x2019;t work that way. And it&#x2019;s not going to. This is not because Rails core is trying to make our lives difficult. It&#x2019;s because ActiveRecord is specifically built to model a relational database. It&#x2019;s an ORM, which means Object-Relational Mapper. In fact, in Rails 3, ActiveRecord moves even more in that direction with Arel, which is, conceptually, an implementation of the theory of a relational database. This is called relational algebra. There&#x2019;s a whole talk on it tomorrow! It&#x2019;s awesome! It&#x2019;s cool. It&#x2019;s conceptually very consistent, easy to understand, and very tidy, all of which appeal to me.
Unfortunately, relational algebra is totally useless with a non-relational datastore where you don&#x2019;t structure your data into rows, and columns, and tables. It just doesn&#x2019;t map to something like a key-value store like redis, or column-oriented document database like mongo.
But once I started looking at Rails 3, I got very happy. The folks who rewrote it realized, correctly, that ActiveRecord in Rails 2 was modelling two very different types of behavior.
Here&#x2019;s what we have in Rails 2: a big, blobby ActiveRecord that does both the communication with the database, and lots of other useful stuff like validations and lifecycle callbacks - stuff your calling code uses, but that isn&#x2019;t directly related to how the data is persisted.
So in Rails 3 they&#x2019;ve split this up into two separate libraries: ActiveRecord that handles persistance, and ActiveModel that handles validations and callbacks.
A lot of the blog posts and stuff about ActiveModel have talked about how this makes it easier to take validations and serialization and so on from ActiveModel and use them in plain old Ruby objects outside of Rails.
But as someone who does mostly Rails development, I actually don&#x2019;t really care about that. What I&#x2019;m excited about is that it&#x2019;s now much easier to extract ActiveRecord from your models in Rails. It makes it much easier to write an adapter for a non-relational datastore and be able to use it in your models in a Rails app. All you have to do is present the same API to ActiveModel that ActiveRecord does, and that&#x2019;s actually pretty easy. And in fact,
There are already several ActiveModel-compliant persistance libraries out there for accessing these things, like mongoid by Durran Jordan. So, you actually can, sort of, have a similar interface for different persistance models in Rails 3. Now, this doesn&#x2019;t really solve the problem of multiple stores in one model, but it does make it possible to keep a lot of the niceties of what was previously &#x201C;ActiveRecord&#x201D; without actually persisting to a relational database. And that&#x2019;s pretty hot!
Now that it&#x2019;s getting easier to use a nonrelational store, I think we&#x2019;re going to start seeing a lot more applications take advantage of them.
Here are a few of the extant ActiveModel-compliant libraries. Most of them are quite alpha and experimental. But expect to see more, and see them mature.
Here&#x2019;s the basic setup. In Rails 3 we no longer inherit from ActiveRecord::Base, so the models are nicely decoupled from any default datastore. Here we&#x2019;re defining Mongo as our primary store.
Sadly, ActiveModel doesn&#x2019;t solve all of our problems. We still have to work at it to use more than one datastore within the same model.
to add solr search we still need a searchable block. However, UNLIKE in Rails 2, this will still work no matter what primary persistance we use. It uses on the lifecycle methods that are part of ActiveModel. In Rails 2 they were so tightly coupled with ActiveRecord that you pretty much couldn&#x2019;t use them separately.
So coming back to our 2 questions from the beginning of the talk, which were: how do you encapsulate a model that has data scattered across multiple stores, and how do you replace the primary store...
Sadly, the first one - not really solved yet. The best you can do is break stuff into modules where possible, and try to extract the common bits. No silver bullet here, just general good software engineering.
But the second one
...is fully solved in Rails 3. I&#x2019;m actually really excited by that and it&#x2019;ll be interesting to see which ActiveModel-compliant libraries gain traction.
And I&#x2019;d like to close with this thought: if you do is read all the blogs and twitter, you might think SQL vs. NoSQL is some kind of standoff, and that you have to be on one side or the other. Either you&#x2019;re the guys in yellow, or you&#x2019;re the guys in blue.
But that&#x2019;s a false dichotomy. Any application with more than a hundred users will use both. So please: quit it with the FUD, and look for ways they can place nice together.
AND, if you&#x2019;re going to do this, upgrade to Rails 3!