Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecting Data in the AWS Ecosystem

4,191 views

Published on

The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.

Published in: Data & Analytics
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Architecting Data in the AWS Ecosystem

  1. 1. Architecting Data in the AWS Ecosystem Seth Luersen Head of Training, MemSQL
  2. 2. 2 what is happening within AWS overall data landscape benefits of using MemSQL in EC2
  3. 3. 3 Modern Data Relational SQL Schema Structured Operational Non-Relational NoSQL Schema-less Unstructured Analytical
  4. 4. 4 How to Match Database Data-driven application
  5. 5. 5 Workloads Data Shape Size Compute
  6. 6. 6 Shape Columnstore Aggregations and table scans Document Index and store docs for query on any property Graph Persist and retrieve relationships Key-Value Query by key with fast ingest and high throughput Rowstore Operate on a row or row set Time-Series Store and process sequence Unstructured Get and put objects
  7. 7. 7 Size Limit Bounded or Unbounded to a size Working Set 30 years cold Caching Last 10 minutes of hot Result size 1 row at 100 bytes 2 million rows at 200 MB Monolith One big refrigerator Partition Natural boundaries for distribution
  8. 8. 8 Compute Aggregations Average, Count, Sum on 1 trillion rows Batch 50 million rows per batch Concurrency 10,000 requests per second Streaming Ingest 1 million rows ingest per second Latency SLAs for sub-second response Transactions Singleton operations
  9. 9. 9 Choice On Size Fits All Use Case Specific
  10. 10. 10 Navigating the Data Landscape NoSQL Database Data Warehouse Data LakeNon-relational Relational Analytical Operational
  11. 11. NoSQL Database Data Warehouse Data Lake 11 Navigate in AWS Dynamo DB RDS Aurora MySQL PostegreSQL MariaDB SQL Server Oracle S3 Non-relational Relational Elastic Cache Analytical Operational DAX Kinesis Analytics Redshift Athena ElasticSearch EMR Elastic MapReduce Hadoop Spark Presto Hbase
  12. 12. 12 General Use Cases singletons system of record content blobs descriptive predictive prescriptive OperationalAnalytics
  13. 13. columnar partitions billions of rows heavy push down batch writes few updates in-memory compute 13 Analytical Dimensions billions of rows cached / in-memory partitions computed result set files unstructured schema-less relational snowflake etl batches aggregations query latency pushdown shape compute size
  14. 14. NoSQL Database Data Warehouse Data Lake 14 Navigate Analytical Dynamo DB RDS Aurora MySQL PostegreSQL MariaDB SQL Server Oracle EMR Elastic MapReduce Hadoop Spark Presto Hbase S3 Non-relational Relational Elastic Cache Athena Analytical Operational Kinesis Analytics Redshift ElasticSearch DAX
  15. 15. 15 Amazon EMR Elastic Reliable Secure Easy Hadoop, Spark, Hbase, Presto Clickstream Analytics, Real-time Analytics, Log Analysis, ETL, Predictive Analytics Big Data Framework Retries failed task for Hadoop Replace poor performing instance
  16. 16. 16 Amazon Redshift Scalable Secure Inexpensive Fast Fast, powerful, and simple data warehousing; Massively parallel, petabyte scale Scale by resizing Columnar performance $1000 per TB per year Data Warehouse
  17. 17. 17 Amazon S3 + Athena Query Instantly Pay Per Query ANSI SQL Server-less Easy No infrastructure to setup or manage SQL to query S3 files JDBC / ODBC Multiple data formats Relational Joins S3 upload latency Data Lake
  18. 18. 18 Elasticsearch Service Easy to Use Open Source API Secure Fully Managed Easy to deploy, secure, operate, and scale Elasticsearch Log analytics, full text search, & application monitoring Logstash Kibana NoSQL Full Text Search
  19. 19. 19 Analytics Summary Amazon Redshift Amazon S3 + Athena serveless ad-hoc query process, prepare, and index key-value / document low latency per query $$$ non-relational multiple enterprise data sources multiple data formats
  20. 20. 20 General Use Cases singletons system of record content blobs descriptive predictive prescriptive OperationalAnalytics
  21. 21. hot – caching singletons – small compute size low latency high throughput high concurrency ACID, HA, DR 21 Operational Dimensions shape size bounded unbounded monolithic partitioned rows key-values documents relational schema velocity ingest compute
  22. 22. NoSQL Database Data Warehouse Data Lake 22 Navigate AWS RDS Aurora MySQL PostegreSQL MariaDB SQL Server Oracle S3 Non-relational Relational Elastic Cache Analytical Operational Kinesis Analytics Redshift Dynamo DB Athena ElasticSearch DAX EMR Elastic MapReduce Hadoop Spark Presto Hbase
  23. 23. 23 Amazon RDS Administer Easily Highly Scalable Available, Durable SSD Speed Managed relational database service; Six popular database engines Amazon Aurora is multi-AZ durable Database
  24. 24. 24 Amazon ElasticCache Scale Easily Secure, Hardened Available, Reliable Extreme Performance Managed, in-memory data store; Redis or Memcached Add to database to improve read latency Good hit rate if working set fits in cache Price is stale cache reads In-memory Database
  25. 25. 25 Amazon DynamoDB Fully Managed Auto Scaling AZ Replication Consistent Performance NoSQL database for document and key-store Automatic provisioning Auto-scaled tables server millions of request per second Millisecond latency Fault tolerant availability No relational capabilities NoSQL
  26. 26. 26 Amazon DynamoDB Accelerator (DAX) Fully Managed No Stale Cache Reads Extreme Performance Fully managed write-through cache for DynamoDB Reduces millisecond latency to microseconds Fast NoSQL
  27. 27. 27 Operational Summary Amazon RDS Amazon DynamoDB bounded unbounded key-value / document rows relational non-relational monolith partitioned velocity push-down compute fast ingest with DAX
  28. 28. 28 Strategic Planning Assumptions By 2017, as "NoSQL" ceases to distinguish DBMSs, data and analytics leaders will select multimodel and/or specific document, key-value, graph and wide-column DBMSs. Gartner Critical Capabilities for Operational Database Management Systems Published: 6 October 2016 Analyst(s): Merv Adrian, Donald Feinberg, Nick Heudecker, Terilyn Palanca, Rick Greenwald
  29. 29. 29 Navigating the Data Landscape NoSQL No Problem Database Data Warehouse Data LakeNon-relational Relational Analytical Operational
  30. 30. 30 Navigating the Data Landscape Database Data Warehouse Data LakeNon-relational Relational Analytical Operational
  31. 31. 31 Simplify the Data Landscape Converged Data Warehouse Database Data Lake (AWS S3)Non-relational Relational Analytical Operational HTAP, HOAP, Translytical
  32. 32. 32 Latency Holding Back the Enterprise Lengthy Query Execution Slow query responses Slow reports No real-time response Limited User Access Single threaded operations Challenge with mixed workloads Single box performance Slow Data Loading Batch processing Hours to load Sampled data views
  33. 33. 33 The Enterprise Requires Performance Fast Queries Scalable SQL Real-time dashboards Live data access Scalable User Access Multi-threaded processing Converged transactions and analytics Scale-out for performance Live Loading Stream data On-the-fly transformation Multiple sources
  34. 34. 34 The Database for Real-Time Applications Delivering Operational Analytics at Scale Run Anywhere Any cloud, hybrid, or multicloud On-premises Low cost standard hardware Scale Transactions and Analytics Petabyte scale In-memory and disk-based Unified mixed workload architecture Power Real-Time Applications Fast ingestion and queries Operational capabilities Multi-model and data support
  35. 35. 35 Durable Distributed Storage Highly Available Online replication ensures data consistency and protects against outages Big Data Capacity Petabyte scale with up to 10x compression and instant query retrieval Distributed and Durable Store and process on clusters of machines for performance and persistence
  36. 36. 36 MemSQL Unified Architecture Historical Data Disk-optimized tables with compression for fast analytic queries Live Data Memory optimized tables for analyzing real-time events Streaming Ingest Real-time data pipelines with exactly-once semantics
  37. 37. 37 Drive Real-Time Insights • Rich analytics with Scalable SQL • Support for JSON, Geospatial, Key-Value • Fast Query Vectorization and Compilation • User Defined Functions
  38. 38. 38 Deliver Real-Time ETL Load Guarantee message delivery with exactly-once semantics Transform Map and enrich data with user defined functions or Spark transformations Extract Ingest from Apache Kafka or Spark Change data capture or bulk load
  39. 39. 39 Simple Setup -> CREATE PIPELINE memsql> CREATE PIPELINE twitter_pipeline AS -> LOAD DATA KAFKA "public-kafka.memcompute.com:9092/tweets-json" -> INTO TABLE tweets -> (id, tweet); Query OK, (0.89 sec) memsql> START PIPELINE twitter_pipeline; Query OK, (0.01 sec)
  40. 40. 40 Ecosystem Overview Streaming Ingest Live Data Historical Data Real-Time Data Messaging and Transforms Historical Data BI Dashboards Kafka Spark Relational Hadoop Amazon S3 Bare Metal, Virtual Machines, Containers On-Premises, Cloud, As a Service Real-Time Applications Tableau Looker Microstrategy
  41. 41. 41 Amazon EC2 + MemSQL Size Memory Size Compute Size Storage ANSI SQL Build a cluster in minutes Pipelines for ingest Easy to deploy with MemSQL Ops High Availability ACID Data Warehouse and Database
  42. 42. 42 AWS Aurora MemSQL Dataset easily fits under 500 GB Single server compute Write-centric without reads Dataset from 100 GB to 1 PB Horizontal scale Simultaneous read and write workloads Database from AWS and MemSQL
  43. 43. 43 Redshift MemSQL No requirements for fast data ingest No requirement for for concurrency Fast data ingest required Support for high concurrency Data Warehouse from AWS and MemSQL
  44. 44. Thank You!

×