The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.
6. 6
Shape
Columnstore Aggregations and table scans
Document Index and store docs for query on any property
Graph Persist and retrieve relationships
Key-Value Query by key with fast ingest and high throughput
Rowstore Operate on a row or row set
Time-Series Store and process sequence
Unstructured Get and put objects
7. 7
Size
Limit Bounded or Unbounded to a size
Working Set 30 years cold
Caching Last 10 minutes of hot
Result size 1 row at 100 bytes
2 million rows at 200 MB
Monolith One big refrigerator
Partition Natural boundaries for distribution
8. 8
Compute
Aggregations Average, Count, Sum on 1 trillion rows
Batch 50 million rows per batch
Concurrency 10,000 requests per second
Streaming Ingest 1 million rows ingest per second
Latency SLAs for sub-second response
Transactions Singleton operations
16. 16
Amazon Redshift
Scalable Secure Inexpensive Fast
Fast, powerful, and simple data warehousing;
Massively parallel, petabyte scale
Scale by resizing
Columnar performance
$1000 per TB per year
Data Warehouse
17. 17
Amazon S3 + Athena
Query
Instantly
Pay Per
Query
ANSI
SQL
Server-less
Easy
No infrastructure to setup or manage
SQL to query S3 files
JDBC / ODBC
Multiple data formats
Relational Joins
S3 upload latency
Data Lake
18. 18
Elasticsearch Service
Easy to
Use
Open Source
API
Secure Fully
Managed
Easy to deploy, secure, operate, and scale Elasticsearch
Log analytics, full text search, & application monitoring
Logstash
Kibana
NoSQL Full Text Search
19. 19
Analytics Summary
Amazon Redshift Amazon S3 + Athena
serveless ad-hoc query
process, prepare, and index key-value / document
low latency
per query $$$
non-relational
multiple enterprise data sources
multiple data formats
28. 28
Strategic Planning Assumptions
By 2017, as "NoSQL" ceases to distinguish
DBMSs, data and analytics leaders will
select multimodel and/or specific document,
key-value, graph and wide-column DBMSs.
Gartner Critical Capabilities for Operational Database Management Systems
Published: 6 October 2016
Analyst(s): Merv Adrian, Donald Feinberg, Nick Heudecker, Terilyn Palanca, Rick Greenwald
29. 29
Navigating the Data Landscape
NoSQL
No
Problem
Database
Data
Warehouse
Data LakeNon-relational
Relational
Analytical Operational
30. 30
Navigating the Data Landscape
Database
Data
Warehouse
Data LakeNon-relational
Relational
Analytical Operational
31. 31
Simplify the Data Landscape
Converged Data Warehouse Database
Data Lake (AWS S3)Non-relational
Relational
Analytical Operational
HTAP, HOAP, Translytical
32. 32
Latency Holding Back the Enterprise
Lengthy Query Execution
Slow query responses
Slow reports
No real-time response
Limited User Access
Single threaded operations
Challenge with mixed workloads
Single box performance
Slow Data Loading
Batch processing
Hours to load
Sampled data views
33. 33
The Enterprise Requires Performance
Fast Queries
Scalable SQL
Real-time dashboards
Live data access
Scalable User Access
Multi-threaded processing
Converged transactions and analytics
Scale-out for performance
Live Loading
Stream data
On-the-fly transformation
Multiple sources
34. 34
The Database for Real-Time Applications
Delivering Operational Analytics at Scale
Run
Anywhere
Any cloud, hybrid, or multicloud
On-premises
Low cost standard hardware
Scale
Transactions and Analytics
Petabyte scale
In-memory and disk-based
Unified mixed workload architecture
Power
Real-Time Applications
Fast ingestion and queries
Operational capabilities
Multi-model and data support
35. 35
Durable Distributed Storage
Highly Available
Online replication ensures
data consistency and protects
against outages
Big Data Capacity
Petabyte scale with up to
10x compression and instant
query retrieval
Distributed and Durable
Store and process on clusters
of machines for performance
and persistence
36. 36
MemSQL Unified Architecture
Historical Data
Disk-optimized tables
with compression for
fast analytic queries
Live Data
Memory optimized tables
for analyzing real-time
events
Streaming Ingest
Real-time data pipelines
with exactly-once
semantics
37. 37
Drive Real-Time Insights
• Rich analytics with Scalable SQL
• Support for JSON, Geospatial,
Key-Value
• Fast Query Vectorization and
Compilation
• User Defined Functions
38. 38
Deliver Real-Time ETL
Load
Guarantee message delivery with
exactly-once semantics
Transform
Map and enrich data with user defined
functions or Spark transformations
Extract
Ingest from Apache Kafka or Spark
Change data capture or bulk load
40. 40
Ecosystem Overview
Streaming Ingest Live Data Historical Data
Real-Time Data
Messaging and
Transforms
Historical Data BI Dashboards
Kafka Spark
Relational Hadoop Amazon S3
Bare Metal, Virtual Machines, Containers On-Premises, Cloud, As a Service
Real-Time Applications
Tableau Looker Microstrategy
41. 41
Amazon EC2 + MemSQL
Size
Memory
Size
Compute
Size
Storage
ANSI
SQL
Build a cluster in minutes
Pipelines for ingest
Easy to deploy with MemSQL Ops
High Availability
ACID
Data Warehouse
and Database
42. 42
AWS Aurora MemSQL
Dataset easily fits
under 500 GB
Single server compute
Write-centric without reads
Dataset from
100 GB to 1 PB
Horizontal scale
Simultaneous read and write
workloads
Database from AWS and MemSQL
43. 43
Redshift MemSQL
No requirements for
fast data ingest
No requirement for
for concurrency
Fast data ingest required
Support for high concurrency
Data Warehouse from AWS and MemSQL