Apache Cassandra at Wayin

1,805 views

Published on

Jamey Wood presents on Apache Cassandra at Wayin for the Colorado Cassandra Users Group on August 28th, 2013.
http://www.meetup.com/Colorado-Cassandra-Meetup/

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,805
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apache Cassandra at Wayin

  1. 1. August 28, 2013 Cassandra in the Cloud August 28, 2013Jamey Wood
  2. 2. Wayin: History 8/30/2013 2 Founded in 2011 Located in beautiful Denver, Colorado Global clients in largest corporations, sports teams, agencies, and publishers $20M raised Co-founded by Scott McNealy Twitter Certified May 2013
  3. 3. Wayin: Mission Transforming Social Media into Brand Experiences 8/30/2013 3
  4. 4. 8/30/2013 4 Marketing is becoming more reactive, and the ability to own, brand, curate and customize relevant experiences in the moment is more valuable now, than it has ever been Why it Works
  5. 5. How it Works 8/30/2013 5 ELB Load Balancer CloudFront S3 Route 53 SQS API Server API Server API Server API Server Scaling Group Auto-Scaled Based on Machine Load Clients DB Server Scaling Groups Scaled Based on Data Volume Cassandra API Server API Server Tracking Server Tracking Server Scaling Group Auto-Scaled Based on Queue Length
  6. 6. Challenge 1: Provisioning and Deployment CloudFormation, Auto Scaling Groups, and the Cassandra Ring 8/30/2013 6 Clients CloudFormation DB Auto Scaling Group: us-east-1a DB Auto Scaling Group: us-east-1b DB Auto Scaling Group: us-east-1c 1a 1a 1b 1c1b 1c Cassandra time
  7. 7. Challenge 1: Provisioning and Deployment Pitfalls and Opportunities 8/30/2013 7 Clients • Auto Scaling Groups are helpful for automatically replacing terminated instances, but certain actions can be problematic. • Be familiar with as-suspend-processes options. • Token management is important to keep Cassandra ring balanced, properly distributed across availability zones, etc. Also important to be able to bring up rings (and launch replacement servers) in a fully automated fashion. • Netflix’s “Priam” open source tool can provide this kind of token management (and more).
  8. 8. Challenge 2: Migration 8/30/2013 8 Clients Jackson{ “_id”: “abc”, “author” : “John Doe”, “body”: “some text”, … } id: “abc” author: “John Doe” data: “{ … }” id: “def” author: “JaneDoe” data: “{ … }” id: “ghi” author: “Jim Doe” data: “{ … }” id: “jkl” author: “Jill Doe” data: “{ … }” MongoDB Cassandra
  9. 9. Challenge 3: Volatile Performance Managing EC2 I/O 8/30/2013 9 Clients Source for EC2 IO Performance Graph: http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/ IO Performance for 45 EC2 Instances over Time Mitigation: md(4) RAID0 across Ephemeral Disks
  10. 10. Challenge 3: Volatile Performance Client Resiliency 8/30/2013 10 Clients new ConnectionPoolConfigurationImpl("MyConnectionPool") // Will resort hosts per token partition every 10 seconds .setLatencyAwareUpdateInterval(10000) // Will clear the latency every 10 seconds .setLatencyAwareResetInterval(10000) // Will sort hosts if a host is more than 100% slower than the best and always // assign connections to the fastest host, otherwise will use round robin .setLatencyAwareBadnessThreshold(2) // Uses last 100 latency samples. These samples are in a FIFO queue and // will just cycle themselves .setLatencyAwareWindowSize(100); Astyanax Example: Configuring Latency Awareness
  11. 11. Challenge 4: Sorting 8/30/2013 11 1a 1b 1c Cassandra 1b 1c 1a • Single wide rows make it easy to code sorting/slicing logic, but can lead to performance hotspots. • Good rule of thumb is to keep individual rows below 10MB in size[1]. • Our current solution involves using “bucketed” wide rows (spreading the data for a given sorting range across multiple keys/servers, and then collating that data during reads). • More info: 1. http://rubyscale.com/blog/2011/03/06/basic-time-series- with-cassandra/ 2. http://www.datastax.com/dev/blog/advanced-time-series- with-cassandra
  12. 12. Challenge 5: Monitoring Nagios Reports 8/30/2013 12 Clients Nagios Report: RecentReadLatency
  13. 13. Challenge 5: Monitoring Nagios Setup 8/30/2013 13 Clients ColumnFamilies/RecentReadLatencyMicros for some_table table check_jmx -U service:jmx:rmi:///jndi/rmi://127.0.0.1:7199/jmxrmi -O org.apache.cassandra.db:columnfamily=some_table ,keyspace=some_keyspace ,type=ColumnFamilies Monitor Cassandra using JMX Nagios Plugin / NRPE (Nagios Remote Plugin Executor) http://wiki.apache.org/cassandra/JmxInterface
  14. 14. Challenge 6: We’re Hiring! Looking for great developers to work with Cassandra (amongst other things) 8/30/2013 14 Clients http://www.wayin.com/about-us/careers Senior Software Engineer Work with great people and great technologies: • Cassandra • JVM • Jetty • Jersey • Jackson • AWS Vice President of Sales Work with great brands and agencies: • Denver Broncos • Atlanta Falcons • St. Louis Rams • San Jose Sharks • Chevrolet • Bank of America • Turtlewax

×