Cassandra & puppet, scaling data at $15 per month

Cassandra & Puppet:,[object Object],Scaling data at $15/month,[object Object],Constant Contact,[object Object],March  2011,[object Object],Dave Connors – VP Operations,[object Object],Jim Ancona – Systems Architect,[object Object],Mark Schena – Manager Systems Automation,[object Object]
Constant Contact,[object Object],Constant Contact,[object Object],2000 – 2010  ,[object Object],Market leader for Small Businesses,[object Object],[object Object]
Over 400k paying customers
No. 134 on the Deloitte Technology Fast 500 listingBusiness model,[object Object],[object Object]
~2 million database transactions per minute,[object Object]
Constant Contact ,[object Object],Small Businesses are looking to us for help with Social Media marketing,[object Object],[object Object]
Challenge with our business model,[object Object]
Cost = Low
Time to market = ?,[object Object]
Monitoring
Authentication
Logging
Risk profile
Roles & Responsibilities,[object Object]
Apache Cassandra,[object Object],Apache Cassandra,[object Object],[object Object]
Open sourced in 2008
Incubated at Apache
Became an Apache top-level project in 2010
http://cassandra.apache.org
In use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, …
Largest production cluster has over 100 TB of data in over 150 machines,[object Object]
Fault Tolerant
Elastic
Durable
Rich data model
Replicated data
Consistency options,[object Object]
Consistency LevelONE,[object Object],Consistency Level One,[object Object],Y,[object Object]
Consistency Level Quorum,[object Object],X,[object Object]
Risks and Mitigation,[object Object],Risks and Mitigation,[object Object],[object Object]
Developer unfamiliarity
1 of 31

More Related Content

Viewers also liked(20)

Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
Adrian Hornsby932 views
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
DataStax Academy13.9K views
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
Matthias Niehoff2.3K views
Sparksee Technology overviewSparksee Technology overview
Sparksee Technology overview
Sparsity Technologies8.2K views
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
Marko Rodriguez21.4K views
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
Marko Rodriguez8.6K views
Sparksee overviewSparksee overview
Sparksee overview
Sparsity Technologies7.6K views
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
Marko Rodriguez8.3K views

Recently uploaded(20)

The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya51 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation23 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet48 views
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting170 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum118 views

Cassandra & puppet, scaling data at $15 per month

Editor's Notes

  1. “… a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable'sColumnFamily-based data model.”
  2. Operational attributesFault TolerantData is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.DecentralizedEvery node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.ElasticRead and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.DurableCassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.Not just key-value. Replication and Consistency are configurable and datacenter aware
  3. Can also be configured cross-datacenter.
  4. Consistency Level is tunableONEQUORUMALLAt level ONE, one copy makes it to disk synchronously, before caller returns success.Same with reads. One node is read, so can get old data
  5. At QUORUM, two of three (QUORUM) written before returningQUORUM Read: Quorum must AGREEFirst two don’t, so wait for the third node to resolve the tie
  6. RISKS0.7.x in beta when we started, multiple betas and RCsRDBMS best practices are understood, if they exist for Cassandra how to discover them?Complicated system, lots of knobs, how to tune themMITIGATIONSWe deployed 0.7.2 across 72 servers in 90 minutes two days before we went live. 0.7.3 any day nowMailing lists, read code, file bugsA little about DatastaxStarted with a small app, higher risk ones come later, Mark will talk about monitoring—we have hundreds of graphs
  7. Data model: no joins, referential integrity or fixed schema. If you want those things, you have to code them.Rows with millions of columnsThrift: driver-level interface, doesn’t do things an application-level client should do, e.g. failover, retry
  8. Contributed bug reports and patches to Cassandra and HectorIncorporated in follow-on releasesNo need to maintain our own fork
  9. Mirror modeShort timeoutsLog errors DB2 is still database of record
  10. Necessary for success
  11. Authentication and Authorization; test basic functionaiity; rapidly deploy new changes