Cassandra & Puppet:<br />Scaling data at $15/month<br />Constant Contact<br />March  2011<br />Dave Connors – VP Operation...
Constant Contact<br />Constant Contact<br />2000 – 2010  <br />Market leader for Small Businesses<br /><ul><li>Email, Even...
Over 400k paying customers
No. 134 on the Deloitte Technology Fast 500 listing</li></ul>Business model<br /><ul><li>Many customers pay as little as $...
~2 million database transactions per minute</li></li></ul><li>Constant Contact<br />The business problem<br />
Constant Contact <br />Small Businesses are looking to us for help with Social Media marketing<br /><ul><li>Social Media  ...
Challenge with our business model</li></li></ul><li>The Key Challenge<br />The Key Challenge<br />Integrate social media d...
Cost = Low
Time to market = ?</li></li></ul><li>Implementation<br />Implementing NoSQL<br />Ops and Dev both face issues<br /><ul><li...
Monitoring
Authentication
Logging
Risk profile
Roles & Responsibilities</li></li></ul><li>Ops<br />Dev<br />
Apache Cassandra<br />Apache Cassandra<br /><ul><li>Developed at Facebook
Open sourced in 2008
Incubated at Apache
Became an Apache top-level project in 2010
http://cassandra.apache.org
In use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, …
Largest production cluster has over 100 TB of data in over 150 machines</li></li></ul><li>What is Cassandra?<br />What is ...
Fault Tolerant
Elastic
Durable
Rich data model
Replicated data
Consistency options</li></li></ul><li>Replication<br />Replication<br />How many copies of each piece of data <br />do we ...
Consistency LevelONE<br />Consistency Level One<br />Y<br />
Consistency Level Quorum<br />X<br />
Risks and Mitigation<br />Risks and Mitigation<br /><ul><li>Moving target
Developer unfamiliarity
Upcoming SlideShare
Loading in...5
×

Cassandra & puppet, scaling data at $15 per month

23,285

Published on

Constant Contact shares lessons learned from DevOps approach to implementing Cassandra to manage social media data for over 400k small business customers. Puppet is the critical in our tool chain. Single most important factor was the willingness of Development and Operations to stretch beyond traditional roles and responsibilities.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
23,285
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
67
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • “… a highly scalable second-generation distributed database, bringing together Dynamo&apos;s fully distributed design and Bigtable&apos;sColumnFamily-based data model.”
  • Operational attributesFault TolerantData is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.DecentralizedEvery node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.ElasticRead and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.DurableCassandra is suitable for applications that can&apos;t afford to lose data, even when an entire data center goes down.Not just key-value. Replication and Consistency are configurable and datacenter aware
  • Can also be configured cross-datacenter.
  • Consistency Level is tunableONEQUORUMALLAt level ONE, one copy makes it to disk synchronously, before caller returns success.Same with reads. One node is read, so can get old data
  • At QUORUM, two of three (QUORUM) written before returningQUORUM Read: Quorum must AGREEFirst two don’t, so wait for the third node to resolve the tie
  • RISKS0.7.x in beta when we started, multiple betas and RCsRDBMS best practices are understood, if they exist for Cassandra how to discover them?Complicated system, lots of knobs, how to tune themMITIGATIONSWe deployed 0.7.2 across 72 servers in 90 minutes two days before we went live. 0.7.3 any day nowMailing lists, read code, file bugsA little about DatastaxStarted with a small app, higher risk ones come later, Mark will talk about monitoring—we have hundreds of graphs
  • Data model: no joins, referential integrity or fixed schema. If you want those things, you have to code them.Rows with millions of columnsThrift: driver-level interface, doesn’t do things an application-level client should do, e.g. failover, retry
  • Contributed bug reports and patches to Cassandra and HectorIncorporated in follow-on releasesNo need to maintain our own fork
  • Mirror modeShort timeoutsLog errors DB2 is still database of record
  • Necessary for success
  • Authentication and Authorization; test basic functionaiity; rapidly deploy new changes
  • Cassandra & puppet, scaling data at $15 per month

    1. 1. Cassandra & Puppet:<br />Scaling data at $15/month<br />Constant Contact<br />March 2011<br />Dave Connors – VP Operations<br />Jim Ancona – Systems Architect<br />Mark Schena – Manager Systems Automation<br />
    2. 2. Constant Contact<br />Constant Contact<br />2000 – 2010 <br />Market leader for Small Businesses<br /><ul><li>Email, Event & Survey
    3. 3. Over 400k paying customers
    4. 4. No. 134 on the Deloitte Technology Fast 500 listing</li></ul>Business model<br /><ul><li>Many customers pay as little as $15 a month
    5. 5. ~2 million database transactions per minute</li></li></ul><li>Constant Contact<br />The business problem<br />
    6. 6. Constant Contact <br />Small Businesses are looking to us for help with Social Media marketing<br /><ul><li>Social Media 10-100 times more data
    7. 7. Challenge with our business model</li></li></ul><li>The Key Challenge<br />The Key Challenge<br />Integrate social media data<br /><ul><li>Solution = NoSQL
    8. 8. Cost = Low
    9. 9. Time to market = ?</li></li></ul><li>Implementation<br />Implementing NoSQL<br />Ops and Dev both face issues<br /><ul><li>Data model
    10. 10. Monitoring
    11. 11. Authentication
    12. 12. Logging
    13. 13. Risk profile
    14. 14. Roles & Responsibilities</li></li></ul><li>Ops<br />Dev<br />
    15. 15. Apache Cassandra<br />Apache Cassandra<br /><ul><li>Developed at Facebook
    16. 16. Open sourced in 2008
    17. 17. Incubated at Apache
    18. 18. Became an Apache top-level project in 2010
    19. 19. http://cassandra.apache.org
    20. 20. In use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, …
    21. 21. Largest production cluster has over 100 TB of data in over 150 machines</li></li></ul><li>What is Cassandra?<br />What is Cassandra<br /><ul><li>Implemented in Java
    22. 22. Fault Tolerant
    23. 23. Elastic
    24. 24. Durable
    25. 25. Rich data model
    26. 26. Replicated data
    27. 27. Consistency options</li></li></ul><li>Replication<br />Replication<br />How many copies of each piece of data <br />do we want?<br />N=3<br />
    28. 28. Consistency LevelONE<br />Consistency Level One<br />Y<br />
    29. 29. Consistency Level Quorum<br />X<br />
    30. 30. Risks and Mitigation<br />Risks and Mitigation<br /><ul><li>Moving target
    31. 31. Developer unfamiliarity
    32. 32. Operational procedures
    33. 33. Reliability concerns
    34. 34. Deployment automation
    35. 35. Community involvement
    36. 36. Training/Consulting
    37. 37. Application selection
    38. 38. Lots of monitoring
    39. 39. Phased rollout</li></li></ul><li>Development Challenges<br />Development Challenges<br />Understanding the data model<br />Choosing a client<br />Clients available for Java, Python, .NET, Ruby, PHP<br />Don’t use Thrift<br />Moving target<br />
    40. 40. Open Source<br /><ul><li>Not “one neck to wring”
    41. 41. Paid support and training is available: http://datastax.com
    42. 42. Community</li></ul>Mailing lists<br />IRC #cassandra at freenode<br /><ul><li>Contribute</li></li></ul><li>Phased Rollout<br /><ul><li>Switchable modes
    43. 43. Mirroring
    44. 44. Dial-able traffic </li></li></ul><li>Collaboration<br /><ul><li>Big, complex project
    45. 45. Close collaboration
    46. 46. Flexible roles
    47. 47. Ability to iterate</li></li></ul><li>Ops<br />Dev<br />
    48. 48. “Are you sure you really want that?” <br />“Are you sure you really want that?”<br /><ul><li>3 500G disks
    49. 49. 1 250G disk
    50. 50. No SWAP
    51. 51. RAID Zero Root Partition and Data Storage
    52. 52. 32G Memory</li></li></ul><li>We will need how many servers?<br />We will need how many servers?<br />
    53. 53. How many nodes?<br /><ul><li>Quorum = 3
    54. 54. Multiple Datacenters = 2
    55. 55. Use only half the available disk = 2
    56. 56. 12 Servers = ~1 TB Of Data Storage
    57. 57. ~6 TB of Data Storage </li></ul>72<br />x 6 =<br />x 2 = 12<br />3<br />x 2 = 6<br />
    58. 58. Ran<br />Random Partitioner<br />
    59. 59. Tool Chain<br />Tool Chain<br />
    60. 60. with Puppet<br /><ul><li>Puppet is the shared framework between Operations and Development
    61. 61. Versioning of puppet code allows for adoption of development best practices
    62. 62. Leverage Domain specific knowledge and skill</li></ul>DevOps with Puppet<br />
    63. 63. Always Move Forward<br />Always Move Forward<br />
    64. 64. Operational Efficiencies<br /><ul><li>Remote logging is a requirement
    65. 65. Cassandra uses log4j natively
    66. 66. Resources not available for remote log4j development
    67. 67. Scribed with Puppet provides the solution</li></ul>Operational Efficiencies<br />
    68. 68. Development takes the Operational Lead<br /><ul><li>Munin
    69. 69. JMX trending
    70. 70. Identify critical data points
    71. 71. Rapid development of graphs
    72. 72. Puppet Definitions are used for rapid deployment</li></li></ul><li>Sample Munin Graph<br />
    73. 73. Puppet Code <br />Example: Munin Puppet Code<br />define munin::cassandracolumnfamily ( ) {<br /> include cassandravirtual<br /> File <| title == "jmxbin" |><br /> $confdir="/opt/cassandra-munin-plugins”<br /> $plugindir="/etc/munin/plugins"<br /> $target="/opt/cassandra-munin-plugins/jmx_"<br /> # Match 3 strings separated by periods<br /> $pattern = '^([^.]*)[.]([^.]*)[.]([^.]*)$'<br /> $keyspace = regsubst($name, $pattern, '1')<br /> $columnfamily = regsubst($name, $pattern, '2')<br /> $file = regsubst($name, $pattern, '3')<br />file {"${keyspace}_${columnfamily}_${file}.conf":<br /> owner => 'root', ensure => 'file', group => 'root', type => 'file',<br /> path => "${confdir}/${keyspace}_${columnfamily}_${file}.conf",<br /> mode => '644',<br /> content => template("munin/attribute_${file}.conf.erb"),<br /> require => [ Package['munin-node'], File['/opt/cassandra-munin-plugins'], File['jmxquery'], ], <br /> }<br />file {"$plugindir/${keyspace}_${columnfamily}_${file}":<br /> ensure => 'link', owner => 'root', group => 'root', mode => '511', type => 'link',<br /> target => "$target",<br /> require => [ File['/opt/cassandra-munin-plugins'], File["${keyspace}_${columnfamily}_${file}.conf"], File['jmxquery'], Package['munin-node'], ], <br />
    74. 74. Conclusion<br />Conclusion<br /><ul><li>Cassandra as an appliance
    75. 75. Development Best Practices with Life Cycle Management
    76. 76. Traditional vs. Today
    77. 77. Infrastructure </li></ul> 4 weeks 4 hours to build 72 nodes<br /><ul><li>Development to Deployment</li></ul> 9 months 3 months<br /><ul><li>Cost</li></ul> Millions 150k<br />
    78. 78. Q&A<br />Thank You!<br />
    1. Gostou de algum slide específico?

      Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

    ×