This was a three hour workshop given at the 2011 Web 2.0 Expo in San Francisco. Due to the length of the presentation and the number of presenters, portions of the slide deck may appear disjoint without the accompanying narrative.
Abstract: "The hype cycle is at a high for cloud computing, distributed “NoSQL” data storage, and high availability map-reducing eventually consistent distributed data processing frameworks everywhere. Back in the real world we know that these technologies aren’t a cure-all. But they’re not worthless, either. We’ll take a look behind the curtains and share some of our experiences working with these systems in production at SimpleGeo.
Our stack consists of Cassandra, HBase, Hadoop, Flume, node.js, rabbitmq, and Puppet. All running on Amazon EC2. Tying these technologies together has been a challenge, but the result really is worth the work. The rotten truth is that our ops guys still wake up in the middle of the night sometimes, and our engineers face new and novel challenges. Let us share what’s keeping us busy—the folks working in the wee hours of the morning—in the hopes that you won’t have to do so yourself."