Alluxio's Use and Practice in Didi

Presto&Alluxio在滴滴的探索和实践

自我介绍
杜若飞
滴滴出行大数据研发工程师，目前主要负责滴滴出行的Presto生态建设。
杨华峰
滴滴出行大数据研发工程师，长期关注分布式系统、大数据高性能处理的
应用和开发，目前负责Alluxio的应用探索和实践落地。

目录
常见问题和解决方法02
Presto未来规划和思考03
Presto架构和特点01

Presto架构和特点-Presto简介和历史
Presto是由Facebook开发的一个分布式SQL查询引擎，它被设计为用来
专门进行高速、实时的数据分析。它的产生是为了解决Hive的
MapReduce模型太慢以及不能通过BI工具等直接展现HDFS数据等问题。
2012年秋季
2018年01
月
现况启动开源
2013年冬
季

Presto架构和特点-功能和特点
• Ad-hoc，期望查询时间秒级或几分钟
• 比Hive快10倍
• 支持多数据源，如Hive、Kafka、MySQL、MongoDB、Redis、JMX等，
也可自己实现Connector
• Client Protocol: HTTP+JSON, support various languages(Python, Ruby,
PHP, Node.js,Go,Java)
• 支持JDBC/ODBC连接
• ANSI SQL

Presto架构和特点-执行流程

Presto架构和特点-查询计划

Presto架构和特点-性能
01
03
05
02
04
06
细粒度的任务拆分
流水线，无磁盘io
动态编译执行计划
基于内存的并行计算.
本地化计算
类BlinkDB的近似查询

Presto架构和特点-与MR比较

常见问题和解决办法-精度
Query:
select round(cross_city_order_cnt/all_type_order_cnt,4) cross_city_order_rate from table;
Result:
cross_city_order_rate
-----------------------
0
修改为：
select round(payed_order_cnt/cast(complete_order_cnt as double),4)
cross_city_order_rate from table;
Result:
cross_city_order_rate
-----------------------
0.8174

常见问题和解决办法-变长参数
Query:
select product_id from table where city_id=1 and product_id in (1,2,3,4) and
concat_ws('-',year,month,day)='2017-12-01' and channel<>1 and
channel<>1010000001 limit 100;
修改为：
select product_id from table where city_id=1 and product_id in (1,2,3,4)
and array_join(ARRAY[year,month,day],‘-’,‘’)=‘2017-12-01’ and channel<>1
and channel<>1010000001 limit 100;

常见问题和解决办法-消费问题
问题：
Query XXX has not been accessed since
原因：
数据分页输出，5分钟不消费抛弃查询

常见问题和解决办法-INTERNAL ERROR
问题：
INTERNAL ERROR
解决方法：
内核问题；Presto配置问题

常见问题和解决办法-EXTERNAL ERROR
问题：
EXTERNAL ERROR
原因：
数据源；用户权限问题

常见问题和解决办法-高并发
问题：
Encountered too many errors talking to a worker node. The node may
have crashed or be under too much load. This is probably a transient
issue, so please retry your query in a few minutes.
原因：
task.max-worker-threads=cores*4;
解决办法：
config.properties配置合适的值

常见问题和解决办法-监控
• Web UI
• basic query status check
• JMX HTTP API
• GET /v1/jmx/mbean[/{objectName}]
• com.facebook.presto.execution:name=TaskManager
• com.facebook.presto.execution:name=QueryManager
• com.facebook.presto.execution:name=NodeScheduler

未来规划
05
04
03
02
01
配置动态加载
大集群
查询有不稳定现象
查询无索引
Presto On Alluxio

内容
Alluxio架构原理01
Alluxio主要特征02
Alluxio使用场景03
Presto On Alluxio04
Alluxio未来规划05

Alluxio主要特征
01
03
02
04
层次化
存储
分层布
局性
统一
命名
空间
世系
关系

Alluxio使用场景
跨机房或云端数据访问01
多个独立的持久化数据源02
独立job之间共享数据03

Presto On Alluxio（一）
• Ad-hoc查询利用Alluxio加速
• 跨机房访问数据
• Alluxio对Presto的友好支持
• 对查询性能要求较高（速度/稳定性）

Presto On Alluxio（二）
DataNode DataNode DataNode
Alluxio
Worker
Presto
Worker
Alluxio
Worker
Presto
Worker
Alluxio
Worker
Presto
Worker
DataNode
Presto CLI Presto JDBC
HDFS

Presto On Alluxio（三）
测试数据集表A 表B
行数 15 millions 15 millions
大小 17 GB 17 GB
Alluxio v1.6.1，5台worker，每台worker10g 24c
Presto v0.187，5台worker，每台worker48g 24c
Query select count(a.pag_id) from a inner join b on a.pag_id =
b.pag_id group by a.pt limit 100;

Presto On Alluxio（四）
执行时间(s)
序号 Presto On Alluxio Presto On HDFS
1 15.72 41.79
2 15.24 29.37
3 14.00 31.15
4 15.72 29.72
5 14.91 26.13
6 14.11 26.93
7 15.14 24.23
8 15.05 42.43
9 15.73 26.27
10 15.44 29.75

Presto On Alluxio（六）
• alluxio.master.hostname
• alluxio.worker.hostname
• alluxio.user.hostname
• alluxio.user.short.circuit.enabled
• alluxio.user.file.passive.cache.enabled
• hive.max-split-size
• hive.force-local-scheduling

Alluxio未来规划
跨机房数据访问热点数据缓存

Alluxio未来规划
分位数
SQL 编
辑执行
时长
（秒）
模板执行
时长（秒）
5 35 54
10 49 78
15 61 102
20 72 128
25 85 161
30 99 206
35 113 264
40 130 332
45 152 412
50 181 499
55 221 590
60 275 691
65 350 809
70 453 955
75 584 1129
80 768 1328
85 1058 1593
90 1347 1993
95 2044 2753

THANK
YOU
北京嘀嘀无限科技发展有限公司

Alluxio's Use and Practice in Didi

Recommended

Recommended

More Related Content

Similar to Alluxio's Use and Practice in Didi

Similar to Alluxio's Use and Practice in Didi (20)

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Alluxio's Use and Practice in Didi