Improving Presto performance with Alluxio at TikTok
This document discusses improving the performance of Presto queries on Hive data stored in HDFS by leveraging Alluxio caching. It describes how TikTok integrated Presto with Alluxio to cache the most frequently accessed data partitions, reducing the median query latency by 41.2% and average latency by over 20% for cache hits. Custom caching strategies were developed to identify and prioritize caching the partitions consuming the most IO to maximize resource utilization and minimize cache space requirements.
Presto Use Case
●Workload:
○ 600K+ read-only, interactive SQLs daily
● Clusters Size
○ 40K+ vcore
○ 400TB+ memory
● Data Source
○ Hive tables on HDFS
○ Shared Hive Metastore (HMS) with other engines/database like
Spark, Clickhouse etc.
4.
Why Caching?
● IOis the #1 time consuming part in SQL execution
● Slown HDFS datanode when high concurrent reads lands on the same
batch of block repetitively
● Save network bandwidth for other operations like shuffle
5.
Problems with Cache
●Consistency
● Data Locality
● Pluggable Integration
● Resource Utilization
● Cold Start
● Caching Policy
● Multi-Tier Support
● ...
BEST Cache is NO
Cache ?
6.
Open Source Integrations?
Solution 1: Hardcoded URL Swap
● Change path in Location
properties in HMS table/partition
from hdfs:// to alluxio://
Problem
● Prerequisite: Query Engines
shared metadata in HMS
read/write to Alluxio
7.
Open Source Integrations?
Alluxio Catalog Service
●
Problems
● High QPS on Alluxio Master:
Every HMS lookup goes through
the catalog service regardless
whether the table is not cached
● Manual synchronization is
needed to keep metadata in
sync between Hive Metastore &
Alluxio catalog service
8.
Inhouse Presto-on-Alluxio Integration
●Store alluxio path in a separate
table/partition parameters
cachePath in HMS
● Presto loads HDFS path and
optional Alluxio path and prefer
to read from Alluxio if cachePath
parameter presents
9.
Inhouse Presto-on-Alluxio Integration
●Extend CachingFileSystem in
Presto to construct two
FileSystems (HDFS & Alluxio)
● Fallback to read from HDFS
whenever read from Alluxio fails
or timeout
10.
Caching is insufficient
Benchmark
●30% latency reduction on sample
SQLs in production
● The benefits fall to 17% on TPC-DS
average latency reduction
Learning
● Need to identify the IO-intensive
SQLs to maximize the resource
utilization
11.
Customized Cache Strategy
●Collect time spent on
TableScanOperator &
ScanFilterAndProjectOperator
● Aggregate the top N
time-consumed partitions in the
past M days
● Knapsack problem: Given fixed
Alluxio space, find the best sets
of partitions ( and TTL)
12.
Cache Scheduler
Trigger
● Subscribeto HMS changelog on
AddPartition, AlterPartition,
DropPartition events
● Compare with Cache Strategy to
determine whether the changed
partition is cacheable
Mount & Cleanup
● Cacheable partitions are mounted
in Alluxio first before adding
cachePath to HMS
● Cron job to remove cachePath in
HMS and unmount from Alluxio
based on the TTL defined in cache
strategy
● P95 querylatency reduced by
41.2%
● With less than 1% of cache disk
vs daily HDFS increments, 32%
cache coverage in weekly basis
● 91.1% cache-hit SQLs reduce
latency by 20%+
Overall Results
15.
● Experiment with"alluxio-as-lib”
a. Cache Consistency issue-13700
b. Optimized Presto scheduling
hash algorithm
c. Adopt Alluxio Structured Data
● Enable write caches on Alluxio to
chain ETL jobs
Next Steps