Modeling data from the real world to software is nothing new. In the case of Couchbase, as a document oriented system, data modeling is pretty easy. We aren’t constrained by schemas and needing to fit things into relational algebra. Instead, we only need to think about
Tribal crossing set out to designa new game. They planned for a large audience, running in a cloud environment. With a previous experiment they’d deployed as a Facebook app, my polls (based on an RDBMS), they had planned for the traditional sharding approach. The problem was, at the time Tribal was only a few engineers, no real operations staff. If the app took off and became popular, they would have to reshard the database to keep up with the load. It turned out that my polls started to become very quickly popular over a weekend. They spent the entire weekend adding nodes, resharding, repeating just to try to keep up with the new users. Once sharded out across those system, shrinking the system would also be an issue.
No easy way to query: How often do you need to run complex join query? When data is denormalized for speed, how much complex query are you really running? “ Stop thinking in terms of joins and queries is ticket to speed ” Not handling bank transactions: We can live with small percentage of concurrency issue. Err on the side of making player happy.
To represent game data in our system, we simply represent objects as JSON. We will then determine the key for an object using the class name or type of the object and a unique ID. In fact, Couchbase Server can serve up sequence numbers pretty easily by using it’s built in increment function. To represent a one to many relationship, we can have a small list that shows the relationships. This allows us to be closer to normalized, but be slightly denormalized. The code for building out our graph of related items will be quite simple, and because it’s distributed and Couchbase caches hot items, it should be very fast.
TODO: add artwork from screenshots.
Here we see the three different objects in their JSON document form. These are very simple documents, but show the concept. Each document’s key (also known as the _id) is the object’s class, followed by a serial number. Since each player has a plant list, or we can simply create one if the player does not yet have plants, we create the plant list as an array
On this slide, we see a sample blog post in JSON. It has most of the fields you’d expect to have in a blog entry. The one field that is a little different is the comments field. One approach here would be to store all comments on this blog in the blog. This is simple, denormalized and lets us get the data in one shot. There are a coupledownsides though. One is that we may not want to display all of the comments. If I’m showing multiple plogs, maybe blog summaries on a given page, I don’t want to display the comments. The other is that some popular blogs, from popular bloggers, may have 100s or 1000s of commments. Of course, the challenge with this is that we don’t want to display them all at once, and may not want to have to grab such a large amount of data. We can reapply the same denormalization technique we’d encountered earlier.
As you see here, rather than storing comments inline, we can separate them to a comment list, and then from there to individual comments. Comments in this case can be threaded. You may wonder about the performance of such an arrangement because of all of the traffic across the wire. First off, in a distributed system the data may not be local anyway, so we’ll just make it easier by having the client system fetch the data from the server.
If you’re expecting a very large number of comments, or want to display them threaded, you can easily imagine doing so by extending the list technique discussed earlier. This allows us to very easily build very complex arrangements of the data across various keys. Since they distribute throughout the cluster, we spread load out among the cluster nodes. In contrast, with a typical relational model, you may have to have the comments and blogs colocated on a single shard system so you can use join queries. This creates hotspots in the system, and resharding to redistribute the data becomes a manual process. Because of the active cache management in Couchbase Server, the hottest data will be in memory and served very quickly, so the data items may be served very quickly if they’re popular.
First we’ll do a demonstration of finding all of the items owned by a particular player through a view. Then we will do a demonstration of showing a leaderboard from the gamesim data previously shown.
Using a view over all transactions, say they’re in a separate bucket or have type information on them, we can easily query for individual balances.
Advanced Document Design J Chris Anderson Mobile