Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS re:Invent 2016: AWS Database State of the Union (DAT320)


Published on

Raju Gulabani, vice president of AWS Database Services (AWS), discusses the evolution of database services on AWS and the new database services and features we launched this year, and shares our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different shapes and forms. Storage, management, and analysis of this data requires database services that scale and perform in ways not possible before. AWS offers a collection of such database and other data services like Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR to process, store, manage, and analyze data. In this session, we provide an overview of AWS database services and discuss how our customers are using these services today.

Published in: Technology

AWS re:Invent 2016: AWS Database State of the Union (DAT320)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Raju Gulabani, VP Database Services, AWS November 2016 DAT320 AWS Database Services State of the Union
  2. 2. What to Expect from the Session • Learn our strategy and overview of our key services • Get a sense of our scale and key customers per service • Understand when to use which services for your apps
  3. 3. Strategy • Start from the customer and work backwards • Offer managed services • Leverage the cloud architecture • Support migration of apps and data from/to on-premises • Multiple services, each optimized for different use case
  4. 4. Comprehensive Product Portfolio Traditional Apps Relational Databases NoSQL & In-MemoryBig Data RDS Aurora Database Migration Service Relational Databases DynamoDB ElastiCache NoSQL & In-Memory Amazon Redshift EMR Data Pipeline Athena Big Data QuickSight Elasticsearch Amazon ML Analytics
  5. 5. Database Services Usage • Amazon Aurora is the fastest growing service in AWS history • More than 14,000 databases have been migrated using AWS Database Migration Service • DynamoDB served over 56 billion extra requests worldwide on Prime Day compared to the same day the previous week.
  6. 6. Select DB Services Customers
  7. 7. Relational Databases Amazon RDS Amazon Aurora Database Migration Service
  8. 8. • Multi-engine support: Aurora, MySQL, MariaDB, PostgreSQL, Oracle, SQL Server • Automated provisioning, patching, scaling, backup/restore, failover • Use with GP2 or Provisioned IOPS storage • High availability with RDS Multi-AZ – 99.95% SLA for Multi-AZ deployments Amazon RDS Amazon Aurora
  9. 9. Key Insight: Relational Databases are Complex • Our experience running taught us that relational databases can be a pain to manage and operate with high availability • Poorly-managed relational databases are a leading cause of lost sleep and downtime in the IT world!
  10. 10. • Lower TCO because we manage the muck • Get more leverage from your teams • Focus on the things that differentiate you • Built-in high availability and cross region replication across multiple data centers • Available on all engines, including base/standard editions, not just for enterprise editions • Now even a small startup can leverage multiple data centers to design highly available apps with over 99.95% availability. We Made Things Cheaper, Easier, and Better
  11. 11. Enterprise-grade fault tolerance solution for production databases Automatic failover Synchronous replication Inexpensive & enabled with one click High Availability Multi-AZ Deployments
  12. 12. Amazon RDS Customers
  13. 13. • Airbnb moved its main MySQL database to Amazon RDS with only 15 minutes of downtime • RDS simplifies much of the time-consuming administrative tasks associated with databases so engineers can spend more time on features • Uses asynchronous master-slave replication to improve website performance launched via the RDS console or an API call • Leverages multi-Availability Zone (Multi-AZ) for high availability Airbnb – Amazon RDS for MySQL
  14. 14. Reinventing the Relational Database
  15. 15. Key Questions We Asked • What if we started from a clean sheet of paper with only constraint being that the database was a relational database? • Could we offer much better performance by leveraging the massive scale of our cloud? • Could we give you a database with designed durability indistinguishable from 100% and availability of 99.99%? • …And could we be better and cheaper than the 30-year old commercial databases in use today?
  16. 16. Yes, We Can. Answer = Amazon Aurora • A new relational database engine, built from the ground up to leverage AWS • For all new apps that require SQL, we recommend Amazon Aurora • Commercial-grade performance and availability at open source prices • Retains compatibility with MySQL 5.6
  17. 17. Amazon RDS for Aurora • MySQL compatible with up to 5x better performance on the same hardware: 100,000 writes/sec & 500,000 reads/sec • Scalable with up to 64 TB in single database, up to 15 read replicas • Highly available, durable, and fault-tolerant custom SSD storage layer: 6-way replicated across 3 Availability Zones • Transparent encryption for data at rest using AWS KMS • Stored procedures in Amazon Aurora can invoke AWS Lambda functions
  18. 18. Fastest growing service in AWS history Amazon Aurora Customers
  19. 19. Use case: Near real-time analytics and reporting Master Read Replica Read Replica Read Replica Shared distributed storage volume Reader end-point A customer in the travel industry migrated to Aurora for their core reporting application accessed by ~1,000 internal users.  Replicas can be created, deleted and scaled within minutes based on load.  Read-only queries are load balanced across replica fleet through a DNS endpoint – no application configuration needed when replicas are added or removed.  Low replication lag allows mining for fresh data with no delays, immediately after the data is loaded.  Significant performance gains for core analytics queries - some of the queries executing in 1/100th the original time. ► Up to 15 promotable read replicas ► Low replica lag – typically < 10ms ► Reader end-point with load balancing
  20. 20. Amazon Aurora is now PostgreSQL-compatible • PostgreSQL 9.6 compatibility with support for PostGIS • All the features you expect from Amazon Aurora including 15 read replicas with <10ms lag, shared storage, failover without data loss, 6-way replication across 3 Availability Zones, encryption with AWS KMS • Available now in preview
  21. 21. Simplify monitoring from the AWS Management Console  Database load: Identifies database bottlenecks  Easy  Powerful  Identifies source of bottlenecks  Top SQL  Adjustable time frame  Hour, day, week, and longer Max CPU Performance Insights for Amazon RDS
  22. 22. AWS Database Migration Service • Fully managed service for migration from on-premises to the AWS Cloud with minimal downtime • Migrates data to and from all widely used commercial and open source DBs • Schema Conversion Tool that converts source DB schemas, stored procedures and application code to a different target format • Supports homogenous and heterogeneous data replication • A terabyte-sized DB can be migrated for as little as $3
  23. 23. Database Conversion Capabilities in SCT Source Database Target Database Microsoft SQL Server  Amazon Aurora, MySQL, PostgreSQL MySQL  PostgreSQL Oracle  Amazon Aurora, MySQL, PostgreSQL Oracle Data Warehouse  Amazon Redshift PostgreSQL  Amazon Aurora, MySQL Teradata, Netezza, Greenplum  Amazon Redshift
  24. 24. AWS Database Migration Service Customers
  25. 25. Heterogeneous Migration • Oracle private DC to RDS PostgreSQL migration • Used the AWS Schema Conversion Tool to convert their database schema • Used on-going replication (CDC) to keep databases in sync until they reached the cutover window • Benefits: • Improved reliability of the cloud environment • Savings on Oracle licensing costs • SCT Assessment Report let them understand the scope of the migration
  26. 26. NoSQL & In Memory DynamoDB ElastiCache
  27. 27. Fast, Flexible, Scalable NoSQLAmazon DynamoDB
  28. 28. History of NoSQL at Amazon
  29. 29. Key Questions We Asked • Aurora was designed with a single constraint • SQL compatibility and relational database semantics • What if we said no to this constraint? • No to SQL = NoSQL • Could we eliminate the things we didn’t like about relational databases?
  30. 30. Yes, We Can. Answer = Amazon DynamoDB • Database that can scale beyond a single box without any changes to your app • You can start small but know that there is no limit to how successful your app can be • If your app is running fast today with 10 users, it will always run fast, even when you have 1M, 10M or 100M users using your app • No need to spend time tuning queries and diagnosing why your app is running slow • Deliver availability and durability indistinguishable from 100%. • 99.99% and 60 second failover are not good enough • You don’t have to manage anything. You don’t even need to know what a database instance is • No schema. All you need to tell us is the number of reads/sec and writes/sec you want to execute. We do the rest
  31. 31. Amazon DynamoDB Customers
  32. 32. Lyft Easily Scales Up its Ride Location Tracking System using DynamoDB It was so simple to scale out. We had two knobs. One was for reads and one was for writes. Chris Lambert CTO, Lyft ” “ • Lyft serves up to 8x more rides during peak times • The GPS location for all rides was tracked in the ride location tracking system. • In June, 2014, Lyft deployed DynamoDB in production. • Lyft has since moved many of its other data stores over to DynamoDB as well.
  33. 33. In-memory cache Memcached or Redis Fully managed; zero admin Amazon ElastiCache
  34. 34. Key ElastiCache Features • Fully managed • Cache node auto-discovery • Multi-AZ node placement • Fully managed • Persistence • Read replicas • Multi-AZ with auto-failover • Redis cluster
  35. 35. Gaming AdTech Media Mobile Other Amazon ElastiCache Customers
  36. 36. RDS and ElastiCache are Behind Grab’s Taxi-Booking App The latency of a cab call must be low, and remain low even in times of peak traffic of hundreds of thousands of cab requests per minute. We use ElastiCache for Redis in front of RDS MySQL to keep our systems’ real time performance at any scale. Ryan Ooi Sr. Devops Engineer, Grab ” “ • Grab is a popular taxi hailing app in southeast Asia. • Average response time of the API layer is <40ms, mandating an in-memory layer to achieve such performance. • A small devops team that tried running Redis on EC2 before, but that was too much work. Using both RDS and ElastiCache in Multi-AZ allowed them to outsource all the management to AWS.
  37. 37. Big Data Amazon Redshift Amazon EMR Amazon Athena Data Pipeline
  38. 38. Amazon Redshift • Petabyte-scale, relational, MPP, data warehousing • Fully managed with SSD and HDD platforms • Built-in end-to-end security, including customer-managed keys • $1,000/TB/year; start at $0.25/hour
  39. 39. Why we built Amazon Redshift • Customers were generating data in the cloud but moving it on-premises to analyze it using a data warehouse • Customers had migrated everything to AWS except their on-premises data warehouses. • They wanted to shut down these data centers but could not till we offered them a solution in the cloud
  40. 40. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Available for analysis Generated data 1990 2000 2010 2020 Key Insight: Most Data Falls on the Floor 90% of the data in a company is never analyzed High costs and complexity of traditional DW systems make it hard to justify the capital expense
  41. 41. Key Questions We Asked • Could we design a system cheap and scalable enough to let you analyze all your data? • Could we build a service that was faster, cheaper, and easier to use than traditional DW systems?
  42. 42. Yes, We Can. Answer = Amazon Redshift • A massively parallel processing (MPP) system with up to 128 compute nodes to store and process up to 2PB of compressed data • At $1,000/TB/year, its so cheap that you can analyze all your data • You can provision a petabyte in under three minutes and pay for it by the hour • 10x performance and 1/10 the price of other solutions • Fully managed with automated provisioning, patching, securing, backup, restore, and built-in fault tolerance
  43. 43. Amazon Redshift Customers
  44. 44. NTT Docomo: Japan’s largest mobile provider • 68 million customers • 10s of TBs per day of data across mobile network • 6PB of total data (uncompressed) • Data science for marketing operations, logistics etc. • Greenplum on-premises • 125 node DS2.8XL cluster • 4,500 vCPUs, 30TB RAM • 6 PB uncompressed data • 10x faster analytic queries • 50% reduction in time for new BI app. deployment • Significantly less ops. overhead
  45. 45. Amazon EMR • Hadoop, Hive, Presto, Spark, Tez, Impala etc. • Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase 1.2.3 and HBase on S3, Phoenix, Tez, Flink • New applications added within 30 days of their open source release • Fully managed, automatically scaling clusters with support for On- Demand and Spot pricing • Support for HDFS and S3 filesystems enabling separated compute and storage; multiple clusters can run against the same data in S3 • HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client- side encryption with customer managed keys and AWS KMS
  46. 46. Why we built Amazon EMR • Customers wanted to use the latest open source analytic frameworks to analyze and transform their data • Customers wanted to use technologies like Spark and Presto in conjunction with AWS services like Amazon S3 and features like EC2 Spot Instances • Customers wanted to benefit from the elasticity that AWS offers
  47. 47. Amazon EMR Customers
  48. 48. Amazon Athena • Serverless query service for querying data in S3 with no infrastructure to manage. • No data loading required; query directly from Amazon S3 • Use standard ANSI SQL queries with support for joins, JSON, and window functions. • Support for multiple data formats include text, CSV, TSV, JSON, Avro, ORC, Parquet • Pay per query only when you’re running queries; $5/TB scanned; if you compress your data, your queries cost less
  49. 49. Why we built Amazon Athena • Customers wanted an easy way to run ad-hoc queries on data in Amazon S3 with no infrastructure to manage • Customers wanted a service that could complement their use of Amazon Redshift and Amazon EMR • Customers wanted to give this capability to anyone in their company and only pay per query
  50. 50. Amazon Athena Customers
  51. 51. Analytics QuickSight Amazon ES Amazon ML
  52. 52. As a native cloud service, QuickSight combines the speed, scalability, and and ease of deployment that our customers have come to depend on with the value and cost effectiveness you expect from AWS. Amazon QuickSight Fast, easy to use business analytics service at 1/10th the cost of traditional BI solutions.
  53. 53. Amazon QuickSight • Auto-Discover AWS data sources like Amazon Redshift, RDS, and S3 • Connect to third-party sources like Excel, Salesforce, and other hosted/on-premises databases • Super-fast performance with SPICE • Instant visualizations with Autograph • Securely share and collaborate on analyses, dashboards and stories • Native iPhone experience and web based access from all other devices • Governed datasets • User access controls • Active Directory Integration
  54. 54. QuickSight providing real-time insights at MLB Advanced Media QuickSight provides us with a real-time, 360 degree view of our business without being constrained by pre- built dashboards and metrics expanding our use of data to make informed decisions. Brandon Sangiovanni Sr. BI Development Manager ” “
  55. 55. Distributed search and analytics engine Managed service using Elasticsearch and Kibana Fully managed; zero admin Highly available and reliable Tightly integrated with other AWS servicesAmazon Elasticsearch Service
  56. 56. Amazon Elasticsearch Service Leading Use-Cases Log Analytics & Operational Monitoring • Monitor the performance of your application, web servers, and hardware • Easy to use, yet powerful data visualization tools to detect issues in near real-time • Ability to dig into your logs in an intuitive, fine-grained way • Kibana provides fast, easy visualization Traditional Search • Application or website provides search capabilities over diverse documents • Tasked with making this knowledge base searchable and accessible • Key search features including text matching, faceting, filtering, fuzzy search, auto complete, and highlighting • Query API to support application search
  57. 57. Media and Entertainment Online Services Technology Other Amazon Elasticsearch Customers
  58. 58. Case Study: Adobe Developer Platform (Adobe I/O) Over 200,000 API calls per second peak • destinations, response times, bandwidth Log data is routed with Amazon Kinesis to Amazon Elasticsearch Service, then displayed using AES Kibana Adobe team can easily see traffic patterns and error rates, quickly identifying anomalies and potential challenges Amazon Kinesis Streams Spark Streaming Amazon Elasticsearch Service Data Sources 1
  59. 59. Which Service Should You Use? Situation Solution Existing application Use your existing engine on RDS • MySQL  Amazon Aurora, RDS for MySQL • PostgreSQL  RDS for PostgreSQL • Oracle, SQL Server  RDS for Oracle, RDS for SQL Server New application • If you can avoid relational features  DynamoDB • If you need relational features  Amazon Aurora Data Warehouse & BI • Amazon Redshift and Amazon QuickSight Ad hoc analysis of data in S3 • Amazon Athena and Amazon QuickSight Spark, Hadoop, Hive, HBase • Amazon EMR Log analytics, operational monitoring and search • Amazon Elasticsearch Service
  60. 60. Thank you!
  61. 61. Remember to complete your evaluations!