La big datacamp-2014-aws-dynamodb-overview-michael_limcaco

1,552 views
1,461 views

Published on

Big Data Camp LA 2014 - An overview of Dynamo DB By Michael Limcaco of Amazon

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,552
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco

  1. 1. MichaelLimcaco Solutions Architect Amazon Web Services NoSQL in the Cloud: Amazon DynamoDB Fast and durable at any scale
  2. 2. Databases in the Cloud first a little context
  3. 3. Traditional Database Architecture App/Web Tier Client Tier RDBMS one database for all workloads
  4. 4. • key-value access • complex queries • transactions • analytics Traditional Database Architecture App/Web Tier Client Tier RDBMS
  5. 5. Data Tier Cache Data Warehouse Blob Store RDBMSNoSQL Search Cloud Data Tier Architecture App/Web Tier Client Tier best database for each workload
  6. 6. Workload Driven Data Store Selection Data Tier Cache Data Warehouse Blob Store RDBMSNoSQL Search logging rich search key/value simple query hot reads analytics complexqueries & transactions
  7. 7. AWS Services for the Data Tier Data Tier Amazon DynamoDB Amazon RDS Amazon ElastiCache Amazon S3 Amazon CloudSearch Amazon Redshift logging rich search key/value simple query hot reads analytics complexqueries & transactions
  8. 8. AWS Services for the Data Tier Data Tier Amazon DynamoDB Amazon RDS Amazon ElastiCache Amazon S3 Amazon CloudSearch Amazon Redshift logging rich search key/value simple query hot reads analytics complexqueries & transactions
  9. 9. DynamoDB is a managed NoSQL database service. Store and retrieve any amount of data Serve any level of request traffic
  10. 10. Consistent, predictable performance. Single digit millisecond latency. Backed on solid-state drives.
  11. 11. Flexible data model. Key attribute pairs. No schema required.
  12. 12. Rich Tooling SDK/Libraries JSON-Based Web API IDE Plugins CLI
  13. 13. Without the operational burden.
  14. 14. DynamoDB Customers
  15. 15. DynamoDB Background
  16. 16. RDBMS = Default Choice • Amazon.com page composed of responses from 1000’s of independent services • Query patterns for different service are different  Catalog service is usually heavy key-value  Ordering service is very write intensive (key-value)  Catalog search has a different pattern for querying Relational Era @ Amazon.com RDBMS PoorAvailability Limited Scalability High Cost
  17. 17. Dynamo = NoSQL Technology • Replicated DHT • Consistent hashing • Optimistic replication • Quorum strategies • Anti-entropy mechanisms • Object versioning Distributed Era @ Amazon.com lack of strong every engineer needsto operational consistency learndistributedsystems complexity
  18. 18. DynamoDB = NoSQL Cloud Service Cloud Era @ Amazon.com Seamless Scalability Fast & Predictable Performance Easy Administration Streamlined Development Cost Effective
  19. 19. partitions 1 .. N table • DynamoDB automatically partitions data by the hash key  Hash key spreads data (& workload) across partitions • Auto-partitioning occurs with  Data set size growth  Provisioned capacity increases Massive and Seamless Scale
  20. 20. WRITES Continuously replicated to 3 Facilities Quorum acknowledgment Persisted to disk (SSD) READS Strongly or eventually consistent No trade-off in latency Durable At Scale
  21. 21. Provisioned Throughput • Request-based capacity provisioning model • Throughput is declared and updated via the API or the console  CreateTable (foo, reads/sec = 100, writes/sec = 150)  UpdateTable (foo, reads/sec=10000, writes/sec=4500) • DynamoDB handles the rest  Capacity is reserved and available when needed  Scaling-up triggers repartitioning and reallocation  No impact to performance or availability Predictable Performance
  22. 22. WRITES Continuously replicated to 3 Facilities Quorum acknowledgment Persisted to disk (SSD) READS Strongly or eventually consistent No trade-off in latency Low Latency At Scale
  23. 23. Making life easier for developers… • Developers are freed from:  Performance tuning (latency)  Automatic 3-way multi-facility replication  Scalability (and scaling operations)  Security inspections, patches, upgrades  Software upgrades, patches  Automatic hardware failover  Improving the underlying hardware …and more! Automated Operations
  24. 24. DynamoDB Primitives
  25. 25. DynamoDB Concepts table
  26. 26. DynamoDB Concepts table items
  27. 27. DynamoDB Concepts attributes items table schema-less schema is defined per attribute
  28. 28. DynamoDB Concepts attributes items table scalar data types • number, string, and binary multi-valued types • string set, number set, and binary set
  29. 29. DynamoDB Concepts hash hash keys mandatory for all items in a table key-value access pattern PutItem UpdateItem DeleteItem BatchWriteItem GetItem BatchGetItem
  30. 30. Hash = Distribution Key partition 1..N hash keys mandatory for all items in a table key-value access pattern determines data distribution
  31. 31. Hash = Distribution Key large number of unique hash keys uniform distribution of workload across hash keys optimal schema design +
  32. 32. Range = Query range hash range keys model 1:N relationships enable rich query capabilities composite primary key all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses
  33. 33. Index Options local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)
  34. 34. Projected Attributes KEYS_ONLY INCLUDE ALL
  35. 35. Projected Attributes KEYS_ONLY INCLUDE ALL
  36. 36. Projected Attributes KEYS_ONLY INCLUDE ALL
  37. 37. Index Options global secondary indexes (GSI) any attribute indexed as new hash or range key KEYS_ONLY INCLUDE ALL
  38. 38. Example Patterns access pattern use case highlighted modeling walk-thru features
  39. 39. • Method 1. Describe the overall use case – maintain context 2. Identify the individual access patterns of the use case 3. Model each access pattern to its own discrete data set 4. Consolidate data sets into tables and indexes • Benefits  Single table fetch for each query  Payloads are minimal for each access Access Pattern Modeling
  40. 40. Multi-tenant application for file storing and sharing • User_ID is the unique identifier of each user • File_ID is the unique identifier of each file, owner by user GoodPK selection:User_ID(hash) + File_ID(range) use case access patterns data design Design Use Case: Media Catalog
  41. 41. 1. Users should be able to query all the files they own 2. Search by File Name 3. Search by File Type 4. Search by Date Range 5. Keep track of Shared Files Design Use Case: Media Catalog use case access patterns data design
  42. 42. 1. Users should be able to query all the files they own 2. Search by File Name 3. Search by File Type 4. Search by Date Range 5. Keep track of Shared Files Design Use Case: Media Catalog use case access patterns data design additional (non-PK) attributes & index candidates
  43. 43. Users Hash key = User_ID Attributes= User_Name Email Address User_Files Hash key = User_ID Range key = File_ID Attributes= Name Size (N) Date SharedFlag Link DynamoDB Data Model: Main Tables User has file[]
  44. 44. + Secondary Indexes Table Name Index Name Attribute to Index Projected Attribute User_Files NameIndex Name KEYS User_Files TypeIndex Type KEYS + Name User_Files DateIndex Date KEYS + Name User_Files SharedFlagIndex SharedFlag KEYS + Name User_Files SizeIndex Size KEYS + Name example only – required data returned determines optimal projections
  45. 45. • Find all files owned by a user  Query User_Files table (User_ID = “2”) Access Pattern 1 User_ID (Hash) File_ID (Range) Name Date Type SharedFlag Size Link 1 1 File1 2013-04-23 JPG 10000 bucket1 1 2 File2 2013-03-10 MP4 Y 1000000 bucket2 2 3 File3 2013-03-10 MP4 Y 2000000 bucket3 2 4 File4 2013-03-10 AVI 3000000 bucket4 3 5 File5 2013-04-10 MP4 40000 bucket5
  46. 46. • Find all files owned by a user  Query User_Files table (User_ID = “2”) Access Pattern 1 User_ID (Hash) File_ID (Range) Name Date Type SharedFlag Size Link 1 1 File1 2013-04-23 JPG 10000 bucket1 1 2 File2 2013-03-10 MP4 Y 1000000 bucket2 2 3 File3 2013-03-10 MP4 Y 2000000 bucket3 2 4 File4 2013-03-10 AVI 3000000 bucket4 3 5 File5 2013-04-10 MP3 40000 bucket5
  47. 47. • Search by File Name  Query • IndexName = “NameIndex” • User_ID = “1” • Name = “File1” Access Pattern 2 User_ID (hash) Name (range) File_ID 1 File1 1 1 File2 2 2 File3 3 2 File4 4 3 File5 5 NameIndex
  48. 48. • Search by File Name  Query • IndexName = “NameIndex” • User_ID = “1” • Name = “File1” Access Pattern 2 User_ID (hash) Name (range) File_ID 1 File1 1 1 File2 2 2 File3 3 2 File4 4 3 File5 5 NameIndex
  49. 49. • Search for file name by file Type  Query • IndexName = “TypeIndex” • User_ID = “2” • Type = “MP4” Access Pattern 3 UserId (hash) Type (range) File_ID Name 1 JPG 1 File1 1 MP4 2 File2 2 MP4 4 File4 2 AVI 3 File3 3 MP3 5 File5 projection TypeIndex
  50. 50. • Search for file name by file Type  Query • IndexName = “TypeIndex” • User_ID = “2” • Type = “MP4” Access Pattern 3 UserId (hash) Type (range) File_ID Name 1 JPG 1 File1 1 MP4 2 File2 2 MP4 4 File4 2 AVI 3 File3 3 MP3 5 File5 projection TypeIndex
  51. 51. • Search for file name by Date range  Query • IndexName = “DateIndex” • User_ID = “1” • Date between “2013-03-01” and “2013-03-29” Access Pattern 4 User_ID (hash) Date (range) FileId Name 1 2013-03-10 2 File2 1 2013-04-23 1 File1 2 2013-03-10 3 File3 2 2013-03-10 4 File4 3 2013-04-10 5 File5 DateIndex projection
  52. 52. • Search for file name by Date range  Query • IndexName = “DateIndex” • User_ID = “1” • Date between “2013-03-01” and “2013-03-29” Access Pattern 4 User_ID (hash) Date (range) FileId Name 1 2013-03-10 2 File2 1 2013-04-23 1 File1 2 2013-03-10 3 File3 2 2013-03-10 4 File4 3 2013-04-10 5 File5 DateIndex projection
  53. 53. • Search for names of Shared files  Query • IndexName = “SharedFlagIndex” • User_ID = “1” • SharedFlag = “Y” Access Pattern 5 User_ID (hash) SharedFlag (range) FileId Name 1 Y 2 File2 2 Y 3 File3 SharedFlagIndex projection
  54. 54. • Search for names of Shared files  Query • IndexName = “SharedFlagIndex” • User_ID = “1” • SharedFlag = “Y” Access Pattern 5 User_ID (hash) SharedFlag (range) FileId Name 1 Y 2 File2 2 Y 3 File3 SharedFlagIndex projection
  55. 55. • Schema-less  Only key information needed  Individual items can define their own set of attributes • Consistent Reads  Inventory, shopping cart applications • Atomic Counters  Increment and return new value in same operation • Conditional Writes  Expected value before write – fails on mismatch  “state machine” use cases Highlighted Features
  56. 56. Hadoop Integration + Amazon Elastic Map Reduce (EMR) Managed Hadoop service for data-intensive workflows.
  57. 57. Define External Table (Hive) create external table items_db (id string, votes bigint, views bigint) stored by 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' tblproperties ("dynamodb.table.name" = "items", "dynamodb.column.mapping" = "id:id,votes:votes,views:views");
  58. 58. Query It select id, likes, views from items_db order by views desc;
  59. 59. What Else? autoscaling local testing cross-region library and development export / import
  60. 60. • Third party library for automating scaling decisions • Scale up for service levels, scale down for cost • CloudFormation template for fast deployment Autoscaling with Dynamic DynamoDB
  61. 61. • Cross-Region Export and Import • DynamoDB Local  Disconnected development with full API support • No network • No usage costs • No SLA • Geospatial and Transaction Libraries • Fine-Grained Access Control  Direct-to-DynamoDB access for mobile devices Other Key Features Get started today! aws.amazon.com/dynamodb/developer-resources/
  62. 62. Wrapup
  63. 63. Managed NoSQL seamless scalability predictable performance always durable automated operations fast development cost effective =
  64. 64. Thank You aws.amazon.com/dynamodb

×