Cassandra & Puppet:Scaling data at $15/monthConstant ContactMarch  2011Dave Connors – VP OperationsJim Ancona – Systems ArchitectMark Schena – Manager Systems Automation
Constant ContactConstant Contact2000 – 2010  Market leader for Small BusinessesEmail, Event & Survey
Over 400k paying customers
No. 134 on the Deloitte Technology Fast 500 listingBusiness modelMany customers pay as little as $15 a month
~2 million database transactions per minuteConstant ContactThe business problem
Constant Contact Small Businesses are looking to us for help with Social Media marketingSocial Media               10-100 times more data
Challenge with our business modelThe Key ChallengeThe Key ChallengeIntegrate social media dataSolution = NoSQL
Cost = Low
Time to market = ?ImplementationImplementing NoSQLOps and Dev both face issuesData model
Monitoring
Authentication
Logging
Risk profile
Roles & ResponsibilitiesOpsDev
Apache CassandraApache CassandraDeveloped at Facebook
Open sourced in 2008
Incubated at Apache
Became an Apache top-level project in 2010
http://cassandra.apache.org
In use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, …
Largest production cluster has over 100 TB of data in over 150 machinesWhat is Cassandra?What is CassandraImplemented in Java
Fault Tolerant
Elastic
Durable
Rich data model
Replicated data
Consistency optionsReplicationReplicationHow many copies of each piece of data do we want?N=3
Consistency LevelONEConsistency Level OneY
Consistency Level QuorumX
Risks and MitigationRisks and MitigationMoving target
Developer unfamiliarity

Cassandra & puppet, scaling data at $15 per month

Editor's Notes

  • #9 “… a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable'sColumnFamily-based data model.”
  • #10 Operational attributesFault TolerantData is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.DecentralizedEvery node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.ElasticRead and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.DurableCassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.Not just key-value. Replication and Consistency are configurable and datacenter aware
  • #11 Can also be configured cross-datacenter.
  • #12 Consistency Level is tunableONEQUORUMALLAt level ONE, one copy makes it to disk synchronously, before caller returns success.Same with reads. One node is read, so can get old data
  • #13 At QUORUM, two of three (QUORUM) written before returningQUORUM Read: Quorum must AGREEFirst two don’t, so wait for the third node to resolve the tie
  • #14 RISKS0.7.x in beta when we started, multiple betas and RCsRDBMS best practices are understood, if they exist for Cassandra how to discover them?Complicated system, lots of knobs, how to tune themMITIGATIONSWe deployed 0.7.2 across 72 servers in 90 minutes two days before we went live. 0.7.3 any day nowMailing lists, read code, file bugsA little about DatastaxStarted with a small app, higher risk ones come later, Mark will talk about monitoring—we have hundreds of graphs
  • #15 Data model: no joins, referential integrity or fixed schema. If you want those things, you have to code them.Rows with millions of columnsThrift: driver-level interface, doesn’t do things an application-level client should do, e.g. failover, retry
  • #16 Contributed bug reports and patches to Cassandra and HectorIncorporated in follow-on releasesNo need to maintain our own fork
  • #17 Mirror modeShort timeoutsLog errors DB2 is still database of record
  • #18 Necessary for success
  • #26 Authentication and Authorization; test basic functionaiity; rapidly deploy new changes