AWS Webcast - Build high-scale applications with Amazon DynamoDB


Published on

Review this webinar to learn about Amazon DynamoDB. DynamoDB is a highly scalable, fully managed NoSQL database service. Built for consistent single-digit millisecond latency and high availability, DynamoDB is a great fit for gaming, ad-tech, mobile, and many other applications.

Reasons to review:
• Learn the fundamentals of DynamoDB
• Understand how to design for common access patterns
• Discover best practices
• Hear how others uses DynamoDB to build their business

Who should review:
• Software Developers
• Database Administrators
• Solution Architects
• Technical Decision Makers

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS Webcast - Build high-scale applications with Amazon DynamoDB

  1. 1. Chris Munns Solutions Architect Amazon Web Services Build High-Scale Applications with Amazon DynamoDB
  2. 2. Traditional Database Architecture App/Web Tier Client Tier Database Tier
  3. 3. • key-value access • complex queries • transactions • analytics One Database for All Workloads App/Web Tier Client Tier RDBMS
  4. 4. Cloud Data Tier Architecture App/Web Tier Client Tier Data Tier Search Cache Blob Store RDBMSNoSQL Data Warehouse
  5. 5. Workload Driven Data Store Selection Data Tier Search Cache Blob Store RDBMSNoSQL Data Warehouse logging analytics key/value simple query rich search hot reads complex queries and transactions
  6. 6. AWS Services for the Data Tier Data Tier Amazon DynamoDB Amazon RDS Amazon ElastiCache Amazon S3 Amazon Redshift Amazon CloudSearch logging analytics key/value simple query rich search hot reads complex queries and transactions
  7. 7. RDBMS = Default Choice • page composed of responses from 1000’s of independent services • Query patterns for different service are different  Catalog service is usually heavy key-value  Ordering service is very write intensive (key-value)  Catalog search has a different pattern for querying Relational Era @ RDBMS Poor Availability Limited Scalability High Cost
  8. 8. Dynamo = NoSQL Technology • Replicated DHT with consistency management • Consistent hashing • Optimistic replication • “Sloppy quorum” • Anti-entropy mechanisms • Object versioning Distributed Era @ lack of strong every engineer needs to operational consistency learn distributed systems complexity
  9. 9. DynamoDB = NoSQL Cloud Service Cloud Era @ Non-Relational Fast & Predictable Performance Seamless Scalability Easy Administration
  10. 10. DynamoDB Fundamentals
  11. 11. database service automated operations predictable performance fast development always durable low latency cost effective =
  12. 12. partitions 1 .. N table • DynamoDB automatically partitions data by the hash key  Hash key spreads data (& workload) across partitions • Auto-partitioning occurs with:  Data set size growth  Provisioned capacity increases Massive and Seamless Scale large number of unique hash keys + uniform distribution of workload across hash keys ready to scale app’s
  13. 13. Making life easier for developers… • Developers are freed from:  Performance tuning (latency)  Automatic 3-way multi-AZ replication  Scalability (and scaling operations)  Security inspections, patches, upgrades  Software upgrades, patches  Automatic hardware failover  Improving the underlying hardware …and lots of other stuff Automated Operations
  14. 14. Provisioned Throughput • Request-based capacity provisioning model • Throughput is declared and updated via the API or the console  CreateTable (foo, reads/sec = 100, writes/sec = 150)  UpdateTable (foo, reads/sec=10000, writes/sec=4500) • DynamoDB handles the rest  Capacity is reserved and available when needed  Scaling-up triggers repartitioning and reallocation  No impact to performance or availability Predictable Performance
  15. 15. WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD) READS Strongly or eventually consistent No trade-off in latency Durable At Scale
  16. 16. WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD) READS Strongly or eventually consistent No trade-off in latency Low Latency At Scale
  17. 17. DynamoDB Customers
  18. 18. “DynamoDB has scaled effortlessly to match our company's explosive growth, doesn't burden our operations staff, and integrates beautifully with our other AWS assets”. “I love how DynamoDB enables us to provision our desired throughput, and achieve low latency and seamless scale, even with our constantly growing workloads.”
  19. 19. Weatherbug mobile app Lightning detection & alerting for 40M users/month Developed and tested in weeks, at “1/20th of the cost of the traditional DB approach” Super Bowl promotion Millions of interactions over a relatively short period of time Built the app in 3 days, from design to production-ready Fast Development
  20. 20. Cost Effective “Our previous NoSQL database required almost a full time administrator to run. Now AWS takes care of it.” “Being optimized at AdRoll means we spend more every month on snacks than we do on DynamoDB – and almost nothing on an ops team” Save Money Reduce Effort
  21. 21. DynamoDB Primitives
  22. 22. DynamoDB Concepts table
  23. 23. DynamoDB Concepts table items
  24. 24. DynamoDB Concepts attributes items table schema-less schema is defined per attribute
  25. 25. DynamoDB Concepts attributes items table scalar data types • number, string, and binary multi-valued types • string set, number set, and binary set
  26. 26. DynamoDB Concepts hash hash keys mandatory for all items in a table key-value access pattern PutItem UpdateItem DeleteItem BatchWriteItem GetItem BatchGetItem
  27. 27. Hash = Distribution Key partition 1..N hash keys mandatory for all items in a table key-value access pattern determines data distribution
  28. 28. Hash = Distribution Key large number of unique hash keys uniform distribution of workload across hash keys optimal schema design +
  29. 29. Range = Query range hash range keys model 1:N relationships enable rich query capabilities composite primary key all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses
  30. 30. Index Options local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)
  31. 31. Projected Attributes KEYS_ONLY INCLUDE ALL
  32. 32. Projected Attributes KEYS_ONLY INCLUDE ALL
  33. 33. Projected Attributes KEYS_ONLY INCLUDE ALL
  34. 34. Index Options global secondary indexes (GSI) any attribute indexed as new hash or range key Same projected attribute options
  35. 35. • Currently 13 operations in total Simple API Manage Tables • CreateTable • UpdateTable • DeleteTable • DescribeTable • ListTables Read and Write Items • PutItem • GetItem • UpdateItem • DeleteItem Read and Write Multiple Items • BatchGetItem • BatchWriteItem • Query • Scan
  36. 36. • Scalar data types  String (S) - Unicode with UTF8 binary encoding  Number (N) up to 38 digits precision and can be between 10-128 to 10+126 • Variable width encoding can occupy up to 21 bytes • Multi-valued types  String Set (SS)  Number Set (NS)  Not ordered Data types
  37. 37. • Data is indexed by the primary key  Single Hash Key • Targeted towards object persistence  Hash Range composite Key • Sorted collection within hash bucket • Can store series of events for a given entity • Automatic partitioning  Leading hash key spreads data & workload across partitions • Traffic is scaled out and parallelized Indexing & Partitioning
  38. 38. • Consistent Reads  Inventory, shopping cart applications • Atomic Counters  Increment and return new value in same operation • Conditional Writes  Expected value before write – fails on mismatch  “state machine” use cases • Sparse Indexes  Ideal for sorted lists; fast access to a subset of items  Popular: identify recently updated items; top lists; leaderboards Other Features
  39. 39. • Use API/SDK/CLI Management Console to crate tables • Use the AWS SDK to interact with DynamoDB  PutItem, UpdateItem, DeleteItem  Query  Scan  etc. How to use DynamoDB? $client = $aws->get("dynamodb"); $tableName = "ProductCatalog"; $response = $client->putItem(array( "TableName" => $tableName, "Item" => $client->formatAttributes(array( "Id" => 120, "Title" => "Book 120 Title", "ISBN" => "120-1111111111", "Authors" => array("Author12", "Author22"), "Price" => 20, "Category" => "Book", "Dimensions" => "8.5x11.0x.75", "InPublication" => 0, ) ), "ReturnConsumedCapacity" => 'TOTAL' )); Libraries, SDK’s Web Console Interaction Command Line Figure: Writing an item to a table via the PHP SDK
  40. 40. • Higher-Level Programming Interfaces  Object Persistence Model for .NET & Java  Helper Classes for .NET  Transaction Library for Java • Local DynamoDB available for development and testing • Dynamic DynamoDB for auto-scaling • Many community contributed tools/frameworks How to use DynamoDB? [DynamoDBTable("ProductCatalog")] public class Book { [DynamoDBHashKey] public int Id { get; set; } public string Title { get; set; } public int ISBN { get; set; } [DynamoDBProperty("Authors")] public List<string> BookAuthors { get; set; } [DynamoDBIgnore] public string CoverPage { get; set; } } Figure: .NET class using object persistence model
  41. 41. Use Libraries and Tools Transactions  Atomic transactions across multiple items & tables  Tracks status of ongoing transactions via two tables 1. Transactions 2. Pre-transaction snapshots of modified items Geolocation  Add location awareness to mobile applications  Find Yourself – sample app
  42. 42. • Third party library for automating scaling decisions • Scale up for service levels, scale down for cost • CloudFormation template for fast deployment Autoscaling with Dynamic DynamoDB
  43. 43. • Disconnected development with full API support No network No usage costs Develop and Test Locally – DynamoDB Local Note! DynamoDB Local does not have a durability or availability SLA m2.4xlarge DynamoDB Local do this instead!
  44. 44. Some minor differences from Amazon DynamoDB • DynamoDB Local ignores your provisioned throughput settings  The values that you specify when you call CreateTable and UpdateTable have no effect • DynamoDB Local does not throttle read or write activity • The values that you supply for the AWS access key and the Region are only used to name the database file • Your AWS secret key is ignored but must be specified  Recommended using a dummy string of characters Develop and Test Locally – DynamoDB Local
  45. 45. • Reports CloudWatch metrics  Latency  Consumed throughput  Errors  Throttling • Alarms can be used to dynamically size throughput Monitoring CloudWatch
  46. 46. • DynamoDB can be used for large data ingest • Redshift can directly load data from DynamoDB (COPY) • EMR can directly read from DynamoDB by using Hive Analytics CREATE EXTERNAL TABLE pc_dynamodb ( [attributes] ) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler ' TBLPROPERTIES ([properties]); Amazon S3 Redshift EMR External Hive table External Hive table Hive DynamoDB CREATE EXTERNAL TABLE pc_s3 ( [attributes] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://myawsbucket1/catalog/';
  47. 47. • Provisioned Throughput:  $0.0065 per hour for every 10 units of Write Capacity 1 write per second for 1 KB items  $0.0065 per hour for every 50 units of Read Capacity 1 consistent read per second for 4 KB items • Storage  $0.25 per GB-month of storage • Free tier!  100MB storage + 50 writes/sec + 10 reads/sec each month Pricing
  48. 48. Best Practices
  49. 49. • Method 1. Describe the overall use case – maintain context 2. Identify the individual access patterns of the use case 3. Model each access pattern to its own discrete data set 4. Consolidate data sets into tables and indexes • Benefits  Single table fetch for each query  Payloads are minimal for each access Access Pattern Modeling
  50. 50. • Design for uniform data access across items  Partition distribution based on hash key  Hash Key should be well distributed  Access frequency should be distributed across different hash keys • Time Series Pattern  Logging  Focus only on recent data Table Best Practices Hash Key value Efficiency User ID, where the application has many users. Good Status code, where there are only a few possible status codes. Bad Device ID, where even if there are a lot of devices being tracked, one is by far more popular than all the others. Bad
  51. 51. • Use One-to-Many Tables instead of large set attributes  Break items up in multiple tables • Use Multiple Tables to support Varied Access Patterns  If you frequently access large items but do not use all attributes, store smaller frequently attributes in separate tables • Compress large attributes  Reduces cost of storage and throughput • Store large attributes in S3 Item Best Practices
  52. 52. • Avoid sudden burst of read Activity  Reduce page size of Scans  Isolate scan operations; create separate tables and write to both: • Mission-Critical Table • Shadow Table • Take advantage of parallel scans  Sequential scans take longer Query and Scan Best Practices
  53. 53. Quick Poll + Questions? Thanks for joining!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.