Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building with AWS Databases: Match Your Workload to the Right Database (DAT301) - AWS re:Invent 2018

1,159 views

Published on

We have recently seen some convergence of different database technologies. Many customers are evaluating heterogeneous migrations as their database needs have evolved or changed. Evaluating the best database to use for a job isn't as clear as it was ten years ago. We'll discuss the ideal use cases for relational and nonrelational data services, including Amazon ElastiCache for Redis, Amazon DynamoDB, Amazon Aurora, Amazon Neptune, and Amazon Redshift. This session digs into how to evaluate a new workload for the best managed database option. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.

Building with AWS Databases: Match Your Workload to the Right Database (DAT301) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Matching the Database to the Workload Rick Houlihan Principal Technologist, NoSQL AWS D A T 3 0 1
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda for This Session • Database workload classifications • Traditional approaches to scaling RDBMS • How NoSQL databases compare • The flavors of NoSQL on AWS • What database to use when
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why did you choose this database? “Because we heard X is the best new thing.” “Because we have a site license for X.” “Because X is what we know how to use.”
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why did you choose this database? “Because this database is purpose built to support what my application is designed to do.”
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Types of Database Workloads • Online Transaction Processing (OLTP) Most common type of app • Online Analytics Processing (OLAP) BI and ad-hoc data projections • Decision Support Systems (DSS) Long running query aggregations and projections Operations Analytics
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sizing the Workload Unbounded problems are harder to solve “I need a root cause analysis engine to correlate transaction level events to trading patterns across global markets.” Problems with limited scope are easier to solve “I need a system to manage inventory in my store.”
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sizing the Database
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scaling an RDBMS
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sharded Relational DBs? A B C D ?
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NoSQL Databases • Denormalize and shard to provide horizontal scale • Near unbounded throughput and storage Collection 1 1 TB Shard A 500 GB Shard B 500 GB
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 00 55 A954 FFAA00 FF Partition Keys in NoSQL Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale Id = 1 Name = Jim Hash (1) = 7B Id = 2 Name = Andy Dept = Eng Hash (2) = 48 Id = 3 Name = Kim Dept = Ops Hash (3) = CD Key Space
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Iron Triangle of Data - All About CAP C A PConsistency: all clients always have the same view of data Partition tolerance: the system works well despite physical network partitions Availability: all clients can always read and write CA MSSQL Oracle DB2 MySQL Aster Data Greenplum Postgres CP Big Table Hypertable HBase MongoDB Terastore Couchbase Scalaris DynamoDB BerkeleyDB Memcached Redis Pick Two AP Voldemort Tokyo Cabinet KAI DynamoDB Cassandra SimpleDB CouchDB Riak Data Models: Relational Wide Column Document Key/Value
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Technology Adoption and the Hype Curve
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DW | Big Data Processing | Ad hoc AWS Databases and Analytics Broadest and deepest portfolio purpose-built for builders Business Intelligence & Machine Learning Data Movement Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams Amazon QuickSight Relational Databases RDS Aurora Data lake (Batch/ETL) S3/Glacier (Storage) Glue (ETL & Data Catalog) Machine Learning Macie (Data Protection) NoSQL Databases Analytics (OLAP/DSS) DynamoDB (Wide Column/Document) ElastiCache (Indexed Key Value) Amazon Redshift EMR Athena Kinesis Data Analytics Elasticsearch Service Real-time Opertional (OLTP) Neptune (Graph) QLDB (Ledger) Timestream (TSDB)
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon RDS Managed relational database service with a choice of six popular database engines Easy to administer Highly flexible Available & durable Fast No need for infrastructure provisioning, installing and maintaining database software. Scale database compute and storage with a few mouse clicks and zero downtime. Multi-AZ: Automatically replicates data. Automated backup, snapshots, failover. Choose between dual SSD- backed storage for high- performance OLTP.
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon DynamoDB Fast and flexible NoSQL database service for any scale Key-value NoSQL database that supports both document and wide column structures Fast, consistent performance Highly scalable Fully managed Business-critical reliability Consistent single-digit millisecond latencies at any scale. DAX speeds up times to microseconds. Auto-scaling tables serving millions of requests per second, storing hundreds of terabytes of data. Automatic provisioning and infrastructure management. Data replicated across multiple AZs and accessed with regionally available APIs.
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DynamoDB Schema Table Items Attributes Partition Key Sort Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SQL vs. NoSQL Design Pattern
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Neptune Fully managed graph database Fast Reliable Open Query billions of relationships with millisecond latency Six replicas of your data across three AZs with full backup and restore Build powerful queries easily with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF graph models Easy
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graph Workloads
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Normalized Graph Design Pattern
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. De-normalized Graph Design Pattern Node Relationship Endpoint Bill person Bill visited Eiffel Tower Alice person Alice visited Eiffel Tower friend Bob Bob person Bob born 7/14/90 friend Alice Interest Mona Lisa Leonardo daVinci person Leaonardo daVinci La Jaconde a Washington video La Jaconde… about Mona Lisa Eiffel Tower place Eiffel Tower located Paris 7/14/90 date 7/14/90 Paris place City The Louvre place Museum location Paris Mona Lisa painting Mona Lisa creator Leonardo daVinci location The Louvre Nodes are the vertices of a graph Relationships are the edges of a graph Select nodes to get edges for an entity Index Relationship and Endpoint for edge type and target aggregations Follow the edges to traverse the graph Bob wants to see the Mona Lisa. While he is in Paris he would like to see other things his friends have enjoyed.
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graph Query Types Node Query (Primary) What entities are in the graph? Edge Query (Index) What relationships do graph entities have? Hybrid Query (Traversal) How are entities related through each other? RDBMS, NoSQL, GraphDB GraphDB
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift – Data Warehousing Fast, powerful, and simple data warehousing at 1/10 the cost Massively parallel, petabyte scale Fast Inexpensive Scalable Secure Columnar storage technology to improve I/O efficiency and parallelize queries. Data load scales linearly. As low as $1,000 per terabyte per year, 1/10 the cost of traditional data warehouse solutions. Resize your cluster up and down as your performance and capacity needs change. Data encrypted at rest and transit. Isolate clusters with VPC. Manage your own keys with AWS KMS.
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena – Interactive Analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (Coming soon) Serverless Zero setup cost. Just point to Amazon S3, and start querying. Pay per query Pay only for queries run. Save 30–90% on per- query costs through compression. Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types. Easy Serverless. Zero infrastructure. Zero administration.
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. QLDB (Preview) Fully managed ledger database Track and verify history of all changes made to your application’s data Immutable and transparent Cryptographically verifiable Easy to useHighly scalable Append-only, immutable journal tracks history of all changes which cannot be deleted or modified. Get full visibility into entire data lineage All changes are cryptographically chained and verifiable Executes 2 – 3X as many transactions than ledgers in common blockchain frameworks Flexible document model, query with familiar SQL- like interface
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Timestream (sign up for the preview today) Fast, scalable, and fully managed time series database 1,000x faster at 1/10 the cost of relational databases Trillions of daily events Analytics optimized for time series data Serverless Collect fast moving time- series data from multiple sources at the rate of millions of inserts per second Capable of processing trillions of events daily; the adaptive query processing engine maintains steady, predictable performance Built-in analytics for interpolation, smoothing, and approximation to identify trends, patterns, and anomalies No servers to manage; time-consuming tasks such as hardware provisioning, software patching, setup, & configuration done for you
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Categories of Database Optimized for Storage Optimized for Compute Optimized for Relationships Normalized relational or dimensional DW Denormalized document, wide column or key value Denormalized entity relationship Ad hoc queries and aggregations Instantiated views and computed aggregations Ad hoc entity/relationship aggregations Scale vertically Scale horizontally Hybrid Great for OLAP and DSS Built for OLTP or DSS at scale Designed for graph traversals SQL NoSQL Graph
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Infinite Scale: The database can gracefully increase size and throughput without practical limits The Iron Triangle of Purpose (The PIE Theorem) I P E Efficiency: The database will deliver required query latency for the workload at all times Pattern Flexibility: The database supports random access patterns and ad hoc queries PI Amazon RDS Elasticsearch Aurora Serverless Neptune IE Pick Two PE Data Models: Relational Wide Column Document Graph Columnar Unstructured Amazon DynamoDB Amazon Redshift Athena
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hundreds of Thousands of Customers Use DynamoDB
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hundreds of Thousands More Use Amazon RDS
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose Built Database Solutions from AWS Provisioning Capacity planning Monitoring OS patching Hardware upgrades Database upgrades Security patches Scaling Monitoring Performance tuning Replication across data centers Re-replicate on server failureProvision new regions Infrastructure Software With Zero Unplanned Downtime
  33. 33. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  34. 34. Time: 15 minutes after this session Location: Speaker Lounge (ARIA East, Level 1, Willow Lounge) Duration: 30 min.
  35. 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×