SlideShare a Scribd company logo
1 of 34
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Accelerate Your Analytic Queries
with Amazon Aurora Parallel Query
Aakash Shah
Sr. Software Engineer
Amazon Aurora, AWS
D A T 3 6 2
Kamal Gupta
Sr. Software Manager
Amazon Aurora, AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
1. Amazon Aurora overview
2. Deep dive
3. Performance
4. Customer experience
5. Global databases
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora
A relational database reimagined for the cloud
 Speed and availability of high-end commercial databases
 Simplicity and cost-effectiveness of open source databases
 Drop-in compatibility with MySQL and PostgreSQL
 Simple pay as you go pricing
Delivered as a managed service
Amazon Aurora
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale-out, distributed architecture
Master Replica Replica Replica
AVAILABILITY
ZONE 1
SHARED STORAGE VOLUME
AVAILABILITY
ZONE 2
AVAILABILITY
ZONE 3
STORAGE NODES WITH SSDS
• Logging pushed down to a purpose-
built log-structured distributed
storage system
• Storage volume is striped across
hundreds of storage nodes
distributed across 3 availability
zones (AZ)
• Six copies of data, two copies in
each AZ
SQL
TRANSACTIONS
CACHING
SQL
TRANSACTIONS
CACHING
SQL
TRANSACTIONS
CACHING
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The journey so far
• AZ+1 tolerance
• Continuous data backup
• Backtrack
• Instant redo recovery
• Read-replicas with
failover order
• Continuous availability
with Multi-Master
• Global databases
• ZDP
• Serverless
• Auto volume growth
• Performance insights
• Read replica auto scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
QP improvements
Feature Workload type Operation type
LRA Out of memory Scan
Batch scan In memory Scan
AKP Out of memory Non equi-joins
Hash joins In memory & out of
memory
Equi-joins
But what about read latencies of long running
queries? Can we do better?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Netflix
Netflix is the world's leading internet entertainment service with over 130
million memberships in over 190 countries enjoying TV series,
documentaries and feature films across a wide variety of genres and
languages.
“We were able to test Aurora’s Parallel Query feature and the
performance gains were very good. To be specific, for queries doing full
table scan or fetching fat indexes with billions of rows, we noticed the
query time reduced from 32 minutes to 3 minutes. We were able to
reduce the instance type from r3.8xlarge to r3.2xlarge. For this use-case,
Parallel Query was a great win for us.” —Jyoti Shandil, Cloud Data
Architect.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TransNexus
TransNexus is a VOIP software development firm providing fraud
detection, intelligent routing, and analytics solutions for major carriers
worldwide.
“We tested Aurora’s Parallel Query feature with analytics applications
within our ClearIP software product hosted in AWS. We’ve been excited to
find that larger, more intensive queries perform up to 20x faster with
Parallel Query turned on.” —Alec Fenichel, Software Developer.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora PQ benefits
• Fully managed
• Scales with storage
• No special hardware required
• No pre-provisioning required
• No setup and tuning required
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora PQ
Amazon Aurora Storage has thousands of
CPUs
• Presents opportunity to push down and parallelize
query processing using the storage fleet.
• Moving processing close to data reduces network
traffic and latency.
However there are significant challenges
• Data stored in storage node is not range partitioned
– require full scans.
• Data may be in-flight.
• Read views may not allow viewing most recent data.
• Not all functions can be pushed down to storage
nodes.
DATABASE NODE
STORAGE NODES
PUSH DOWN
PREDICATES
AGGREGATE
RESULTS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database node processing
Query Optimizer produces PQ Plan and
creates PQ context based on leaf page
discovery.
PQ request is sent to storage node along
with PQ context.
Storage node produces:
• Partial results streams with processed stable
rows.
• Raw stream of unprocessed rows with pending
undos.
Head node aggregates these data streams
to produce final results.
STORAGE NODES
OPTIMIZER
EXECUTOR
INNODB
NETWORK STORAGE DRIVER
AGGREGATOR
APPLICATION
PARTIAL
RESULTS
STREAM
RESULTS
IN-FLIGHT
DATA
PQ CONTEXT
PQ PLAN
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage node processing
Each storage node runs up to 16 PQ
processes, each associated with a parallel
query.
PQ process receives PQ context
• List of pages to scan.
• Read view and projections.
• Expression evaluation byte code.
PQ process makes two passes through the page list
• Pass 1: Filter evaluation on InnoDB formatted raw
data.
• Pass 2: Expression evaluation on MySQL
formatted data.
PQ PROCESS
PQ PROCESS
Up to 16
STORAGE
NODE PROCESS
PAGE LISTS
TO/FROM HEAD NODE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora PQ summary
Performance
120x lower latencies on TPCH-like benchmarks with Improved I/O performance and reduced CPU
usage on the head node.
High Concurrency
Run both OLTP and light OLAP workloads simultaneously and efficiently.
Cost Effective
PQ comes at no extra cost. Can run on your live data. Potentially reduced effort and data
duplication in your ETL pipeline.
Quiet Tenant
Reduced chance of evicting frequently used pages from the buffer pool that are used by OLTP
workload.
Ecosystem
Get Aurora goodies such as PiTR, Continuous backup, Fast Cloning with PQ.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Well-known decision support benchmark
0x
2x
4x
6x
8x
10x
12x
14x
16x
18x
20x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Query response time reduction
 Peak speed up ~18x
 >2x speedup: 10 of 22 queries
Performance: QP latency gains
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance: PQ latency gains
Well-known decision support benchmark
0x
20x
40x
60x
80x
100x
120x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Query response time reduction
 Peak speed up ~120x
 >10x speedup: 8 of 22 queries
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
0x
40x
80x
120x
160x
200x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Query response time reduction
 Peak speed up ~187x
 >10x speedup: 10 of 22 queries
Performance: Combined latency gains
Well-known decision support benchmark
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Parallel Query: Performance results
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to get started with PQ?
• Create new clusters or restore existing 5.6 clusters.
• Customers can verify that PQ feature is available using select
@@aurora_pq_supported;
• PQ can be statically enabled or disabled for the cluster by using aurora_pq in
the cluster parameter group.
• PQ can dynamically be enabled or disabled per session using set session
aurora_pq = {'ON'/'OFF’}.
• Smart Optimizer automatically selects PQ.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Verifying PQ
mysql> explain select p_name, p_mfgr from part
-> where p_brand is not null
-> and upper(p_type) is not null
-> and round(p_retailprice) is not null;
+----+-------------+-------+...+----------+------------------------------
----------------------------------------------+
| id | select_type | table |...| rows | Extra
|
+----+-------------+-------+...+----------+------------------------------
----------------------------------------------+
| 1 | SIMPLE | part |...| 20427936 | Using where; Using parallel
query (5 columns, 1 filters, 2 exprs; 0 extra) |
+----+-------------+-------+...+----------+------------------------------
----------------------------------------------+
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
PQ status variables
• Aurora_pq_request_attempted
• Aurora_pq_request_executed
• Aurora_pq_request_failed
• Aurora_pq_pages_pushed_down
• Aurora_pq_bytes_returned
• Aurora_pq_request_not_chosen
• Aurora_pq_request_not_chosen_below_min_rows
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Current limitations
• Amazon Aurora PQ currently only available with Aurora MySQL 5.6.
Integration with 5.7 and Postgress will follow.
• Incompatible with db.t2 instance types.
• Available in 5 regions: US East (Virginia, Ohio), US West (Oregon), EU
(Ireland) and Asia Pacific (Tokyo). More regions to follow.
• Integration with Performance Insights and Backtrack will follow.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global replication
Faster disaster recovery and enhanced data locality
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High throughput: Up to 200K writes/sec – negligible performance impact
Low replica lag: < 1 sec cross-country replica lag under heavy load
Fast recovery: < 1 min to accept full read-write workloads after region failure
Global replication–Aurora physical
MR R
REGION 1
AZ 1 AZ 2 AZ 3
SHARED STORAGE
R
REGION 2
AZ 1 AZ 2 AZ 3
SHARED STORAGE
REPLICATION
FLEET
REPLICATION
FLEET
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global replication performance
Logical vs. physical replication
Logical replication with MTS Physical replication
0
100
200
300
400
500
600
0
50,000
100,000
150,000
200,000
250,000
seconds
QPS
Series1
Series2
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
0
50,000
100,000
150,000
200,000
250,000
seconds
QPS
Series1
Series2
SysBench OLTP (write-only) stepped every 600 seconds on R4.16xlarge
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aakash Shah
aakashah@amazon.com
Kamal Gupta
kamalg@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
オラクルエンジニア通信
 

What's hot (20)

re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovationsre:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Installing Postgres on Linux
Installing Postgres on LinuxInstalling Postgres on Linux
Installing Postgres on Linux
 
Make Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For ItMake Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For It
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
 
MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITY
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - Presentation
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
New availability features in oracle rac 12c release 2 anair ss
New availability features in oracle rac 12c release 2 anair   ssNew availability features in oracle rac 12c release 2 anair   ss
New availability features in oracle rac 12c release 2 anair ss
 
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
 

Similar to Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) - AWS re:Invent 2018

Similar to Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) - AWS re:Invent 2018 (20)

Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
 
Building Cloudscale Networks
Building Cloudscale NetworksBuilding Cloudscale Networks
Building Cloudscale Networks
 
Amazon Aurora: Database Week SF
Amazon Aurora: Database Week SFAmazon Aurora: Database Week SF
Amazon Aurora: Database Week SF
 
Building CloudScale Networks - AWS Summit Sydney 2018
Building CloudScale Networks - AWS Summit Sydney 2018Building CloudScale Networks - AWS Summit Sydney 2018
Building CloudScale Networks - AWS Summit Sydney 2018
 
Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...
Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...
Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Machine Learning with Amazon SageMaker - Algorithms and Frameworks - BDA304 -...
Machine Learning with Amazon SageMaker - Algorithms and Frameworks - BDA304 -...Machine Learning with Amazon SageMaker - Algorithms and Frameworks - BDA304 -...
Machine Learning with Amazon SageMaker - Algorithms and Frameworks - BDA304 -...
 
Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasin...
Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasin...Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasin...
Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasin...
 
Performance insights twitch
Performance insights twitchPerformance insights twitch
Performance insights twitch
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
 
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
 
What's new in Amazon Aurora - ADB203 - Chicago AWS Summit
What's new in Amazon Aurora - ADB203 - Chicago AWS SummitWhat's new in Amazon Aurora - ADB203 - Chicago AWS Summit
What's new in Amazon Aurora - ADB203 - Chicago AWS Summit
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Accelerate Your Analytic Queries with Amazon Aurora Parallel Query Aakash Shah Sr. Software Engineer Amazon Aurora, AWS D A T 3 6 2 Kamal Gupta Sr. Software Manager Amazon Aurora, AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda 1. Amazon Aurora overview 2. Deep dive 3. Performance 4. Customer experience 5. Global databases
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Aurora A relational database reimagined for the cloud  Speed and availability of high-end commercial databases  Simplicity and cost-effectiveness of open source databases  Drop-in compatibility with MySQL and PostgreSQL  Simple pay as you go pricing Delivered as a managed service Amazon Aurora
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scale-out, distributed architecture Master Replica Replica Replica AVAILABILITY ZONE 1 SHARED STORAGE VOLUME AVAILABILITY ZONE 2 AVAILABILITY ZONE 3 STORAGE NODES WITH SSDS • Logging pushed down to a purpose- built log-structured distributed storage system • Storage volume is striped across hundreds of storage nodes distributed across 3 availability zones (AZ) • Six copies of data, two copies in each AZ SQL TRANSACTIONS CACHING SQL TRANSACTIONS CACHING SQL TRANSACTIONS CACHING
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The journey so far • AZ+1 tolerance • Continuous data backup • Backtrack • Instant redo recovery • Read-replicas with failover order • Continuous availability with Multi-Master • Global databases • ZDP • Serverless • Auto volume growth • Performance insights • Read replica auto scaling
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. QP improvements Feature Workload type Operation type LRA Out of memory Scan Batch scan In memory Scan AKP Out of memory Non equi-joins Hash joins In memory & out of memory Equi-joins
  • 9. But what about read latencies of long running queries? Can we do better?
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Netflix Netflix is the world's leading internet entertainment service with over 130 million memberships in over 190 countries enjoying TV series, documentaries and feature films across a wide variety of genres and languages. “We were able to test Aurora’s Parallel Query feature and the performance gains were very good. To be specific, for queries doing full table scan or fetching fat indexes with billions of rows, we noticed the query time reduced from 32 minutes to 3 minutes. We were able to reduce the instance type from r3.8xlarge to r3.2xlarge. For this use-case, Parallel Query was a great win for us.” —Jyoti Shandil, Cloud Data Architect.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. TransNexus TransNexus is a VOIP software development firm providing fraud detection, intelligent routing, and analytics solutions for major carriers worldwide. “We tested Aurora’s Parallel Query feature with analytics applications within our ClearIP software product hosted in AWS. We’ve been excited to find that larger, more intensive queries perform up to 20x faster with Parallel Query turned on.” —Alec Fenichel, Software Developer.
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Aurora PQ benefits • Fully managed • Scales with storage • No special hardware required • No pre-provisioning required • No setup and tuning required
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Aurora PQ Amazon Aurora Storage has thousands of CPUs • Presents opportunity to push down and parallelize query processing using the storage fleet. • Moving processing close to data reduces network traffic and latency. However there are significant challenges • Data stored in storage node is not range partitioned – require full scans. • Data may be in-flight. • Read views may not allow viewing most recent data. • Not all functions can be pushed down to storage nodes. DATABASE NODE STORAGE NODES PUSH DOWN PREDICATES AGGREGATE RESULTS
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Database node processing Query Optimizer produces PQ Plan and creates PQ context based on leaf page discovery. PQ request is sent to storage node along with PQ context. Storage node produces: • Partial results streams with processed stable rows. • Raw stream of unprocessed rows with pending undos. Head node aggregates these data streams to produce final results. STORAGE NODES OPTIMIZER EXECUTOR INNODB NETWORK STORAGE DRIVER AGGREGATOR APPLICATION PARTIAL RESULTS STREAM RESULTS IN-FLIGHT DATA PQ CONTEXT PQ PLAN
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Storage node processing Each storage node runs up to 16 PQ processes, each associated with a parallel query. PQ process receives PQ context • List of pages to scan. • Read view and projections. • Expression evaluation byte code. PQ process makes two passes through the page list • Pass 1: Filter evaluation on InnoDB formatted raw data. • Pass 2: Expression evaluation on MySQL formatted data. PQ PROCESS PQ PROCESS Up to 16 STORAGE NODE PROCESS PAGE LISTS TO/FROM HEAD NODE
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Aurora PQ summary Performance 120x lower latencies on TPCH-like benchmarks with Improved I/O performance and reduced CPU usage on the head node. High Concurrency Run both OLTP and light OLAP workloads simultaneously and efficiently. Cost Effective PQ comes at no extra cost. Can run on your live data. Potentially reduced effort and data duplication in your ETL pipeline. Quiet Tenant Reduced chance of evicting frequently used pages from the buffer pool that are used by OLTP workload. Ecosystem Get Aurora goodies such as PiTR, Continuous backup, Fast Cloning with PQ.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Well-known decision support benchmark 0x 2x 4x 6x 8x 10x 12x 14x 16x 18x 20x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Query response time reduction  Peak speed up ~18x  >2x speedup: 10 of 22 queries Performance: QP latency gains
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance: PQ latency gains Well-known decision support benchmark 0x 20x 40x 60x 80x 100x 120x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Query response time reduction  Peak speed up ~120x  >10x speedup: 8 of 22 queries
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 0x 40x 80x 120x 160x 200x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Query response time reduction  Peak speed up ~187x  >10x speedup: 10 of 22 queries Performance: Combined latency gains Well-known decision support benchmark
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Parallel Query: Performance results
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to get started with PQ? • Create new clusters or restore existing 5.6 clusters. • Customers can verify that PQ feature is available using select @@aurora_pq_supported; • PQ can be statically enabled or disabled for the cluster by using aurora_pq in the cluster parameter group. • PQ can dynamically be enabled or disabled per session using set session aurora_pq = {'ON'/'OFF’}. • Smart Optimizer automatically selects PQ.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Verifying PQ mysql> explain select p_name, p_mfgr from part -> where p_brand is not null -> and upper(p_type) is not null -> and round(p_retailprice) is not null; +----+-------------+-------+...+----------+------------------------------ ----------------------------------------------+ | id | select_type | table |...| rows | Extra | +----+-------------+-------+...+----------+------------------------------ ----------------------------------------------+ | 1 | SIMPLE | part |...| 20427936 | Using where; Using parallel query (5 columns, 1 filters, 2 exprs; 0 extra) | +----+-------------+-------+...+----------+------------------------------ ----------------------------------------------+
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. PQ status variables • Aurora_pq_request_attempted • Aurora_pq_request_executed • Aurora_pq_request_failed • Aurora_pq_pages_pushed_down • Aurora_pq_bytes_returned • Aurora_pq_request_not_chosen • Aurora_pq_request_not_chosen_below_min_rows
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Current limitations • Amazon Aurora PQ currently only available with Aurora MySQL 5.6. Integration with 5.7 and Postgress will follow. • Incompatible with db.t2 instance types. • Available in 5 regions: US East (Virginia, Ohio), US West (Oregon), EU (Ireland) and Asia Pacific (Tokyo). More regions to follow. • Integration with Performance Insights and Backtrack will follow.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global replication Faster disaster recovery and enhanced data locality
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High throughput: Up to 200K writes/sec – negligible performance impact Low replica lag: < 1 sec cross-country replica lag under heavy load Fast recovery: < 1 min to accept full read-write workloads after region failure Global replication–Aurora physical MR R REGION 1 AZ 1 AZ 2 AZ 3 SHARED STORAGE R REGION 2 AZ 1 AZ 2 AZ 3 SHARED STORAGE REPLICATION FLEET REPLICATION FLEET
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global replication performance Logical vs. physical replication Logical replication with MTS Physical replication 0 100 200 300 400 500 600 0 50,000 100,000 150,000 200,000 250,000 seconds QPS Series1 Series2 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 0 50,000 100,000 150,000 200,000 250,000 seconds QPS Series1 Series2 SysBench OLTP (write-only) stepped every 600 seconds on R4.16xlarge
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aakash Shah aakashah@amazon.com Kamal Gupta kamalg@amazon.com
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.