Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

1,748 views

Published on

Learn how to optimize your NoSQL database on AWS for cost, efficiency, and scale. NoSQL databases are great for modern datasets that require simplicity in design, handle structured and unstructured data, scale horizontally, and offer finer control over availability. With AWS, you have options for running NoSQL on Amazon EC2 with Amazon EBS or on Amazon DynamoDB. This webinar will dive deep into best practices and architectural considerations for designing and managing NoSQL databases like Cassandra, MongoDB, CouchDB, and Aerospike on EC2 and EBS. We will share best practices around instance and volume selection, provide performance tuning hints, and describe cost optimization techniques.

Learning Objectives:
• Learn about common NoSQL database options and use cases for Cassandra, MongoDB, CouchDB, and Aerospike
• Review best practices around architecting on AWS for different NoSQL databases
• Understand the cost vs. performance of different Amazon EC2 instances and Amazon EBS volumes

Published in: Technology
  • Be the first to comment

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

  1. 1. Andrey Zaychikov, Solutions Architect, EMEA 21.02.2017 Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS
  2. 2. Typical algorithm of choosing right options for NoSQL DB deployments
  3. 3. What we will cover today?
  4. 4. How these databases differs? DynamoDB Cloud-based Self-managed (EC2) Key-value Document-oriented Graph
  5. 5. Cassandra
  6. 6. What is it? • Dynamo model database + CQL • Horizontally scalable • No single point of failure • Data is immutable and stored in collections • JVM based • Lot of management work is done in a background • Rely on gossip protocol
  7. 7. Main concerns of the customers Schema & usage pattern Geo distribution Background routines & specific optimizations
  8. 8. How does it work?
  9. 9. Choosing instance & storage capacity: 80% Writes • For most of the workloads (especially with 50/50 RW ratio) M4s with EBS is the best option • For write-heavy workloads with high RPS requirements C4 with EBS should be considered • When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage
  10. 10. Choosing instance & storage capacity: 80% Reads • For most of the workloads M4s with EBS is the good choice • When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage • When performance requirements are high and dataset is large the best option will be to use R4s with different EBS flavors
  11. 11. FAQ: 2AZ cluster architecture Hint: RetryPolicy for Cassandra Driver
  12. 12. FAQ Cassandra backup / restore Auto Scaling of Cassandra clusters Cassandra in Containers - Restore procedure for the whole cluster can be complicated - Restore for single node can be done with EBS Snapshots - Auto-scaling puts unpredictable pressure on the cluster - Scaling up is simple, but scaling down is extremely complicated - Makes sense only for test / dev environments
  13. 13. FAQ: Troubleshooting JVM Caching Compaction Disks I/O CPU Memory
  14. 14. MongoDB
  15. 15. What is it? • Document-oriented database • Horizontally scalable • HA is based on master / slave replication • Geo-distributed • Lots of management work is done in a background
  16. 16. Main concerns of the customers Schema & usage pattern Geo distribution and performance Data consistency & partition tolerance
  17. 17. How does it work?
  18. 18. Choosing instance & storage • MongoDB needs a lot of memory and really fast disks so unless your dataset is quite big the best option will be either R3 or I2 (depending on the size of the dataset) • If the dataset is big you should consider to use R4 with different EBS flavors • For hidden nodes you use M4 with EBS as EBS snapshots would help you to backup data easily
  19. 19. FAQ: 2AZ cluster architecture Best option: Replica Set in one AZ and Hidden member in another one.
  20. 20. FAQ MongoDB backup / restore Querying large amount of data MongoDB consistency - Hidden nodes with EBS and EBS snapshots backups - Design schema properly - Avoid using MapReduce on Master - Lots of improvements where done but there are some edge cases
  21. 21. FAQ: Troubleshooting Mongos performance Long running queries Fragmentation Disks I/O CPU Memory
  22. 22. CouchDB
  23. 23. What is it? • Document-oriented database built on Dynamo model • Supports RESTful API • Eventual consistency • Lockless optimistic with conflicts resolution • Horizontally scalable (with constraints) • Offline-first database • Map reduce to prepare views
  24. 24. How it works?
  25. 25. Choosing instance & storage
  26. 26. FAQ: 2AZ cluster architecture • You should plan replication schema on your own so it is your responsibility to check how it will behave in case of DR event
  27. 27. FAQ Proper replication schema Indexed views & its performance Proxy for requests
  28. 28. Aerospike
  29. 29. What is it? • In-memory key- value database • High and constant performance • Sharing-nothing architecture • Geo-distributed (hash partitions) • Master-slave replication
  30. 30. How does it work?
  31. 31. Choosing instance & storage • Aerospike is used when the performance requirements are extreme. It needs a lot of memory and super fast disks. That is why EC2 with Ephemeral storage would be a first choice for Aerospike deployments.
  32. 32. FAQ: 2AZ cluster architecture • If one AZ goes down depending on you replication factor you will still have a copy of data • Aerospike will be able to add more nodes and replicate data to it without putting much pressure on the existing nodes • It takes time to replicate data
  33. 33. FAQ Aerospike backup / restore Auto Scaling of Aerospike clusters Aerospike in Containers - Restore procedure for the whole cluster can be complicated - Restore for single node can be done with EBS Snapshots - Auto-scaling puts unpredictable pressure on the cluster - Scaling up is simple, but scaling down is complicated - Does not make any sense
  34. 34. FAQ: Troubleshooting Disks I/O CPU Memory
  35. 35. What is it? • Graph database • JVM based • Provides REST API • Two clustering modes: HA cluster & Casual cluster • Two types of nodes – Core nodes & Read replicas (RAFT protocol) • Uses Cypher language for querying Neo4j Casual Clustering
  36. 36. How does it work?
  37. 37. Choosing instance & storage
  38. 38. FAQ: 2AZ cluster architecture • If AZ fails and the master node was in it – new master election procedure is initiated • Core nodes in Casual cluster mode vote by simple majority • If majority is unavailable cluster becomes read-only
  39. 39. FAQ: Troubleshooting JVM Page Caching Disks I/O CPU Memory
  40. 40. NoSQL on EC2: Cost considerations
  41. 41. General cost considerations Usage pattern (R/W) RPS Size of the dataset Traffic costs Object size Number of nodes
  42. 42. Cost: Performance / Size • If you want to be always cost effective and efficient than deployment is a journey for you • Consider EBS as main option for most of the workloads • If your performance requirements are really high and the size of the dataset is relatively low – consider EC2 with ephemerals, overvise – go for EC2 with EBS
  43. 43. Sum up • There is no general solution for all cases • Context matters and the solution should follow the changing context • Apps and code should be adapted to the way NoSQL DBs work • Initial choice of the deployment options can be changed • Best way to make initial choice of the deployment – PoC
  44. 44. Thank you!

×