Building a high-performance data lake analytics engine at Alibaba Cloud with Presto+Alluxio

Building a High-performance Data Lake
Analytics Engine at Alibaba Cloud

with Presto+Alluxio
Zhenlin Ma

⽬录
Introduction to DLA
01
02
03
DLA Presto Architecture
Optimizations on OSS Data Source

Introduction to DLA
Data Lake Analytics (DLA) is a large scale serverless data federation service on Alibaba Cloud.
Serverless Data Federation Database-like User Experience High performance

列存表
Data Lake Storag
e

（OSS）
One Click
 
Data Lake
DB - Data Streaming - Data
Spark

Streaming
LogService Application Logs
Serverless

Spark

ETL&ML
Serverless

Presto
Metadata
 
Management
Auto

Discovery
Archived

Transactional Data
DW
DMS APP QuickBI
Data Lake Engin
e

（DLA）
Introduction to DLA

DLA Presto
Multi-Coordinator
s

Lake Formation：One Click Data Warehouse, Metadata Discover
y

Enterprise level Access Contro
l

Cost:Billing methods based on the volume of scanned data, or the number of compute units used.

MySQL protocol support

Caching

Data sources:More than15 types of data source is supported，including Alibaba Cloud OSS, ADB,
Table Store , etc.

Contents
Introduction to DLA
01
02
03 Optimizations on OSS Data Source

Oracle
FrontNode
Uni
fi
ed

Meta

Service
OSS MySQL SQLServer …
TableStore
MaxCompute ElasticSearch Druid
Worker Worker Worker
Coordinator
Default Cluster
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
SQL Dialect Transformation/Submit Query/Fetch Result
TableScan/Pushdown
Met
a

Operation
MySQL Protocol
Multiple Charging Model Unified Meta & Access Control

About Presto
Presto is an open source distributed SQL query engine for running interactive analytic queries
against data sources of all sizes ranging from gigabytes to petabytes.
Full Memory Processing Pluggable Connectors Great Community
Full SQL Semantics
Blazing fast, suitable for adhoc

queries, data exploration, and

lightweight ETL.
Compliant with ANSI
SQL, don’t need to worry
that any SQL syntax not
supported.

Challenges to DLA Presto
Oracle
FrontNode
Uni
fi
ed

Meta

Service
TableStore
Coordinator
Default Cluster
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
TableScan/Pushdown
Met
a

Operation

Challenges to DLA Presto
Oracle
FrontNode
Uni
fi
ed

Meta

Service
TableStore
Coordinator
Default Cluster
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
TableScan/Pushdown
Met
a

Operation
Request costs
Bandwidth limit
Performance
pulling large
data Latency to get
metadata/partitions
Performance
pulling large data
Pressure on
data source

small data big data
update frequently
update infrequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/NoSQL: Performance of pulling large data
Online System：Pressure on data source
Big Data/O
ffl
ine：Performance of pulling large data
Challenges to DLA Prest
o

-Analysis

small data big data
update frequently
update not frequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/O
ffl
Concurrency limitation
Avoid reading master
o

-Analysis

small data big data
update frequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/O
ffl
Caching
o

-Analysis

small data big data
update frequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/O
ffl
Caching
Pushdown
o

-Analysis

Oracle
Solutions
FrontNode
统⼀元
数据管
理
TableStore
Coordinator
Default Cluster
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
SQL改写 / 提交查询 / 取查询结果
TableScan/Pushdown
元数据操作
Decrease
Request count
Alluxio Data
Cache
Data Cache

Partition meta cache

splits cache

对源库影响
对源库影响
对源库影响
Limit Concurrency

Read from slavery

One Click Data Lake
Pushdown

DLA Presto Optimizations on OSS Data Source
Decreasing OSS API request count
Alluxio Data Cache

Background

Users report that the OSS Calling fees are high, even higher than DLA
fees

OSS Calling fees = Actual calls × Unit price per 10,000 calls/10000

Hadoop FileSystem
API Invocation
Alibaba Cloud
OSS API Invocation
read
read
…
seek(100)
read
seek(128MB)
read
#1 read as much data as possible
with 1 request
small seek, continue reading
big seek, start a new request
continue reading
#2 read continue reading
…
1.Reduced API call count down to 1/10 for data stored in Text format.

2.Reduced API call count down to 1/3 for data stored in ORC/Parquet format.

3.Saves cost for about 60% to 90% on average.

Fully tested in
Facebook/
Netease/JD
production
environment
Alluxio Data Cach
e

-Why Alluxio
Proven Solution Efficiency Monitoring
High
concurrency

Asynchronous
write cache
Easy to
monitor and
diagnosis

Alluxio Data Cach
e

-Local Cache v.s. Cluster
OSS
Coordinator
Presto Cluster
Master
Alluxio Cluster
read alluxio
on cache miss cache to alluxio
return data
Presto Cluster

Alluxio Data Cach
e

-Local Cache
Alluxio data cache is a library
residing in the Presto worker.

Cache data is stored in local
Disk.

SOFT_AFFINITY

Makes the best attempt to assign the same split to the same worker when doing the
scheduling

Preferred(0) -> Preferred(1) -> LeastBusy
Alluxio Data Cach
e

-Local Cache
Preferred(1)
Preferred(0) Preferred(0)
Preferred(0)
LeastBusy

Alluxio Data Cach
e

-Cluster
OSS
Coordinator
Presto Cluster
Master
Alluxio Cluster
read alluxio
return data
Alluxio is a distributed
caching service to
Presto

Short-circuit read
supported

Alluxio Data Cach
e

OSS
Coordinator
Presto Cluster
Master
Alluxio Cluster
read alluxio
return data
Presto Cluster
Local Cache v.s.Cluster

Data closer to compute node

No extra nodes needed
Local Cache v.s. Collocated Cluster

Easy to maintanance

No resource waste if user didn’t has OSS data source

Local Cache v.s. Cluster

Data closer to compute node

No extra resource needed

Local Cache v.s. Collocated Cluster

Easy to maintenance

No resource waste if user didn’t has OSS
data source
Alluxio Data Cach
e


Alluxio Data Cach
e

-Improvements in DLA
Sceneries of Community Solution v.s. Sceneries of DLA

Queries mainly on hive data sources v.s. Can’t assume that for a specific user

SSD v.s. Ultra cloud disk
Challenges

Performance improvement in the statistical sense may not be perceivable by
users, necessary to increase cache hit ratio for every single query

Low disk throughput affects the acceleration effect
Increase cache hit ratio for every single query Increase disk throughput

Alluxio Data Cach
e

Increase cache hit ratio

Analysis

SOFT_AFFINITY：Preferred(0) -> Preferred(1) ->
LeastBusy

Key is to submit more splits to Preferred Nodes

node-scheduler.max-splits-per-node

Increase node-scheduler.max-splits-per-node

Effect：Cache hit ratio increased

Side effect：load for workers become
Unbalanced
4 splits 1 split 1 split
split1
split2
split3 split5 split6
split4

Alluxio Data Cach
e

Increase cache hit ratio

Unbalanced load

HiveSplit Preferred Nodes：
path.hashCode() % numWorkers

Big file generate more splits, Cause the
corresponding worker getting more load

Need to submit splits of a big file to
different nodes

(path.hashCode() + (start / (fileSize /
numWorkers)))) % numWorkers
2 splits 2 splits 2splits
split4 split5
split1
split2
split3
split4

Alluxio Data Cach
e

Improve disk throughput

20GB Ultra disk throughput：

Write109MB/s Read 108MB/s

Multiple disks

6 ultra disks performance: 600MB/s read/write

Implement

page.path = $root/$page_path

=>

page.path = $roots[page.hash % roots.size]/$page_path

Environment：

Cluster：16cpu64GB * 16 nodes

Disk：20GB ultra disk * 6

Data：TPCH-1TB / ORC / Stored at OSS

Queries Chosen from TPCH：

Include scan to table lineitem(the biggest table)

without join between three or more tables
Alluxio Data Cach
e

-Performance

Test Result
Alluxio Data Cach
e

-Performance

Future Plan
Alluxio Cluster

Shared by multi users

Suitable when Presto auto scaling

Improvements for OSS Data Source

Fragment Result Cache

Query Result Cache

Improve performance of querying small files

More Information about DLA
• DLA Homepage：https://www.aliyun.com/product/datalakeanalytics
• DLA SQL Introduction：https://developer.aliyun.com/article/770819
We are hiring :)

Building a high-performance data lake analytics engine at Alibaba Cloud with Presto+Alluxio

More Related Content

What's hot

Similar to Building a high-performance data lake analytics engine at Alibaba Cloud with Presto+Alluxio

More from Alluxio, Inc.

Recently uploaded

Building a high-performance data lake analytics engine at Alibaba Cloud with Presto+Alluxio