Dynamo db pros and cons


Published on

Published in: Business, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Dynamo db pros and cons

  1. 1. Dynamo DB DynamoDB differs from other Amazon services by allowing developers to purchase a service based on throughput, rather than storage. Although the database does not automatically scale, administrators can request more throughput and DynamoDB will spread the data and traffic over a number of servers using solid-state drives, allowing predictable performance. It offers integration with Hadoop via Elastic MapReduce. Pros Scalable- There is no limit to the amount of data you can store in an Amazon DynamoDB table, and the service automatically allocates more storage, as you store more data using the Amazon DynamoDB write APIs. Distributed- Amazon DynamoDB scales horizontally and seamlessly scales a single table over hundreds of servers. Flexible - Amazon DynamoDB does not have a fixed schema. Instead, each data item may have a different number of attributes. Multiple data types (strings, numbers, binary, and sets) add richness to the data model. Easy Administration— Hosted by Amazon and receives Fully Managed Service from Amazon. Cost Effective— a free tier allows more than 40 million database operations/month and pricing is based on throughput. Built-in Fault Tolerance— Amazon DynamoDB has built-in fault tolerance, automatically and synchronously replicating your data across multiple Availability Zones in a Region for high availability and to help protect your data against individual machine, or even facility failures. Automatic data replication - All data items are stored on Solid State Disks (SSDs) and are automatically replicated across multiple Availability Zones in a Region to provide built-in high availability ,data durability Amazon Redshift Integration-You can load data from Amazon DynamoDB tables into Amazon Redshift, a fully managed data warehouse service. You can connect to Amazon Redshift with a SQL client or business intelligence tool using standard PostgreSQL JDBC or ODBC drivers, and perform complex SQL queries and business intelligence tasks on your data. Integrated Monitoring-Amazon DynamoDB displays key operational metrics for your table in the AWS Management Console. The service also integrates with CloudWatch so you can see your request throughput and latency for each Amazon DynamoDB table, and easily track your resource consumption. Fast, Predictable Performance- Average service-side latencies in single-digit milliseconds Secure- Amazon DynamoDB is secure and uses proven cryptographic methods to authenticate users and prevent unauthorized data access. It also integrates with AWS Identity and Access Management for fine-grained access control for users within your organization. Uses secure algorithms to keep your data safe Amazon Elastic MapReduce Integration- Tightly integrated with Amazon Elastic Map Reduce (Amazon EMR) that allows businesses to perform complex analytics of their large datasets using a hosted pay-as-you-go Hadoop framework on AWS. Strong Consistency, Atomic Counters- Single API call allows you to atomically increment or decrement numerical attributes
  2. 2. Provisioned Throughput — when creating a table, simply specify how much throughput capacity you require. Amazon DynamoDB allocates dedicated resources to your table to meet your performance requirements, and automatically partitions data over a sufficient number of servers to meet your request capacity. Supports Compression-Data can be compressed and stored in binary form using Compression algorithms, such as GZIP or LZO Low learning curve Tunable consistency Composite key support Offers Conditional updates Supports Hadoop integration Map Reduce, Hive Cons 64KB limit on row size 1MB limit on querying Deployable Only on AWS Dynamo is an expensive and extremely low latency solution, If you are trying to store more than 64KB per item Consistency comes with cost– Read capacity units are based on strongly consistent read operations, which require more effort and consume twice as many database resources as eventually consistent reads. Size is multiple of 4KB for Read operations: If you get an item of 3.5 KB, Amazon DynamoDB rounds the items size to 4 KB. If you get an item of 10 KB, Amazon DynamoDB rounds the item size to 12 KB. If Batch reads a 1.5 KB item and a 6.5 KB item, Amazon DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not 8 KB (1.5 KB + 6.5 KB). Queries - Querying data is extremely limited. Especially if you want to query non-indexed data. Unable to do complex queries. DynamoDB is great for lookups by key, not so good for queries, and abysmal for queries with multiple predicates. (Esp. for Eventlog tables) Secondary indexes are not supported. Indexing - Changing or adding keys on-the-fly is impossible without creating a new table. Indexes on column values are not supported. Joins are impossible -you have to manage complex data relations on your code/cache layer. Backup - tedious backup procedure as compared to the slick backup of RDS Latency in availability- When you create a table programatically (or even using AWS Console), the table doesn’t become available instantly NO ACID-In RDBMS we get ACID guarantee, but in Dynamo-db there is no such guarantee. Speed - Response time is problematic compared to RDS. You find yourself building elaborate caching mechanism to compensate for it in places you would have settled for RDS's internal caching. No support for atomic transactions- Each write operation is atomic to an item. A write operation either successfully updates all of the item's attributes or none of its attributes. Additional storage cost for each item- In computing the storage used by the table, Amazon DynamoDB adds 100 bytes of overhead to each item for indexing purposes, this extra 100 bytes is not used in computing the capacity unit calculation.
  3. 3. Latency in read/write- Once you hit the read or write limit, your requests are denied until enough time has elapsed. No triggers Poor query comparison operators No Foreign Keys No Server-side scripts Workarounds to improve performance Table Best Practices- Amazon DynamoDB tables are distributed across multiple partitions. For best results, design your tables and applications so that read and write activity is spread evenly across all of the items in your tables, and avoid I/O "hot spots" that can degrade performance. Design For Uniform Data Access Across Items In Your Tables Distribute Write Activity During Data Upload Understand Access Patterns for Time Series Data Item Best Practices-Amazon DynamoDB items are limited in size. However, there is no limit on the number of items in a table. Rather than storing large data attribute values in an item, consider one or more of these application design alternatives. Use One-to-Many Tables Instead Of Large Set Attributes Use Multiple Tables to Support Varied Access Patterns Compress Large Attribute Values Store Large Attribute Values in Amazon S3 Break Up Large Attributes Across Multiple Items Query and Scan Best Practices-Sudden, unexpected read activity can quickly consume the provisioned read capacity for a table. In addition, such activity can be inefficient if it is not evenly spread across table partitions. Avoid Sudden Bursts of Read Activity Take Advantage of Parallel Scans Local Secondary Index Best Practices-Local secondary indexes let you define alternate range keys on a table. You can then issue Query requests against those range keys, in addition to the table's hash key. Before using local secondary indexes, you should be aware of the inherent tradeoffs in terms of provisioned throughput costs, storage costs, and query efficiency. Use Indexes Sparingly Choose Projections Carefully Optimize Frequent Queries To Avoid Fetches Take Advantage of Sparse Indexes Watch For Expanding Item Collections
  4. 4. Comparison of SQL Server and Dynamo DB Name DynamoDB Microsoft SQL Server Description Hosted, scalable database service by Amazon Microsoft’s relational DBMS Developer Amazon Microsoft Initial release 2012 1989 License n.a. commercial Implementation language C++ Server operating systems hosted Windows Database model Key-value store Relational DBMS Data scheme schema-free yes Typing yes yes Secondary indexes no yes SQL no yes APIs and other access methods RESTful HTTP API OLE DB Tabular Data Stream (TDS) ADO.NET JDBC ODBC Supported programming languages .Net ColdFusion Erlang Groovy Java JavaScript Perl PHP Python Ruby .Net Java PHP Python Ruby Visual Basic Server-side scripts no Transact-SQL and .NET languages Triggers no yes Partitioning methods Sharding Tables can be distributed across several files (horizontal partitioning), but no sharding Replication methods yes yes, (depends on the SQL-Server Edition) MapReduce no no Consistency concepts Eventual Consistency Immediate Consistency Consistent Foreign keys no yes Transaction concepts no ACID
  5. 5. Concurrency yes yes Durability yes yes User concepts Access rights for users and roles can be defined via the AWS Identity and Access Management (IAM) Users with fine-grained authorization concept Specific characteristics Data stored in Amazon cloud Is one of the "Big 3" commercial database management systems besides Oracle and DB2