Hive sql的编译过程

1,066 views

Published on

Published in: Technology, Business
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,066
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
56
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Hive sql的编译过程

  1. 1. Hive sql的编译过程 chenchun@meituan.com Monday, 30 December,
  2. 2. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, 如何理解Hive执⾏行计划
  3. 3. Join select u.name, o.orderid from order o join user u on o.uid = u.uid; user uid name 1 apple 2 orange order uid orderid 1 1001 1 1002 2 1003 Monday, 30 December,
  4. 4. Join select u.name, o.orderid from order o join user u on o.uid = u.uid; user key 1 1 apple 2 orange <1,apple> 2 uid name value <1,orange> key value Map order uid orderid 1 1001 1 <2,1001> 1 1002 1 <2,1002> 2 1003 2 <2,1003> Monday, 30 December,
  5. 5. Join select u.name, o.orderid from order o join user u on o.uid = u.uid; user key 1 <1,apple> 1 <1,apple> <1,orange> 1 <2,1001> 1 2 orange value 2 apple key 1 uid name value <2,1002> key value 2 <1,orange> 2 <2,1003> Map order uid orderid key value 1 1001 1 <2,1001> 1 1002 1 <2,1002> 2 1003 2 <2,1003> Monday, 30 December, Shuffle Sort
  6. 6. Join select u.name, o.orderid from order o join user u on o.uid = u.uid; user key 1 name orderid <1,apple> 1 <1,apple> apple 1001 <1,orange> 1 <2,1001> apple 1002 1 2 orange value 2 apple key 1 uid name value <2,1002> Map order uid orderid key value 1 1001 1 <2,1001> 1 1002 1 <2,1002> 2 1003 2 <2,1003> Monday, 30 December, Shuffle Sort Reduce key value name orderid 2 <1,orange> orange 1003 2 <2,1003>
  7. 7. Group By select rank, isonline, count(*) from city group by rank, isonline; city rank isonline A 1 A 1 city rank isonline A 1 B 0 Monday, 30 December,
  8. 8. Group By select rank, isonline, count(*) from city group by rank, isonline; city key <A, 1> rank isonline A 1 A value 2 key value <A, 1> 1 <B, 0> 1 1 Map city rank isonline A 1 B 0 Monday, 30 December,
  9. 9. Group By select rank, isonline, count(*) from city group by rank, isonline; city key A A value 2 <A, 1> 2 <A, 1> 1 key <A, 1> rank isonline value 1 key value <B, 0> 1 1 Map city rank isonline A 1 B 0 Monday, 30 December, key value <A, 1> 1 <B, 0> 1 Shuffle Sort
  10. 10. Group By select rank, isonline, count(*) from city group by rank, isonline; city key A A value 2 <A, 1> 2 <A, 1> 1 key <A, 1> rank isonline value 1 1 Map city rank isonline A 1 B 0 Monday, 30 December, key value <A, 1> 1 <B, 0> 1 rank isonline value A 1 3 Reduce Shuffle Sort key value <B, 0> 1 rank isonline value B 0 1
  11. 11. Distinct select dealid, count(distinct uid) num from order group by dealid; uid dealid 1 1001 2 1002 2 1001 uid dealid 1 1002 1 1002 2 1001 Monday, 30 December,
  12. 12. Distinct select dealid, count(distinct uid) num from order group by dealid; uid dealid 1 1001 2 1002 2 1001 key partition value Key <1001, 1> 1 1001 <1002, 2> 1 1002 <1001, 2> 1 1001 Map uid dealid partition value Key 1 1002 key 1 1002 <1002, 1> 1 1002 2 1001 <1001, 2> 1 1001 Monday, 30 December,
  13. 13. Distinct select dealid, count(distinct uid) num from order group by dealid; uid dealid 1 1001 2 1002 2 1001 key partition value Key <1001, 1> 1 1001 <1002, 2> 1 1002 <1001, 2> 1 1001 Map value <1001, 1> 1 <1001, 2> 1 <1001, 2> 1 Shuffle Sort uid dealid partition value Key 1 1002 key 1 1002 <1002, 1> 1 1002 2 1001 <1001, 2> 1 1001 Monday, 30 December, key key value <1002, 1> 2 <1002, 2> 1
  14. 14. Distinct select dealid, count(distinct uid) num from order group by dealid; uid dealid 1 1001 2 1002 2 1001 key partition value Key <1001, 1> 1 1001 <1002, 2> 1 1002 <1001, 2> 1 1001 Map value <1001, 1> 1 dealid num <1001, 2> 1 1001 <1001, 2> 1 partition value Key 1 1002 key 1 1002 <1002, 1> 1 1002 2 1001 <1001, 2> 1 1001 2 Reduce Shuffle Sort uid dealid Monday, 30 December, key key value <1002, 1> 2 <1002, 2> 1 dealid num 1002 2
  15. 15. Distinct select dealid, count(distinct uid), count(distinct date) from order group by dealid; uid dealid date 1 1001 1101 2 1001 1101 2 1001 1102 Monday, 30 December,
  16. 16. Distinct select dealid, count(distinct uid), count(distinct date) from order group by dealid; key uid dealid date 1 1001 1101 2 1001 1101 2 1001 1102 Monday, 30 December, Map value partition Key <1001,1,1101> 1 1001 <1001,2,1101> 1 1001 <1001,2,1102> 1 1001
  17. 17. Distinct select dealid, count(distinct uid), count(distinct date) from order group by dealid; key uid dealid date 1 1001 1101 2 1001 1101 2 1001 1102 Map value partition Key <1001,1,1101> 1 1001 <1001,2,1101> 1 1001 <1001,2,1102> 1 1001 需要在Reduce阶段在内存中分对uid和date去重 Monday, 30 December,
  18. 18. Distinct select dealid, count(distinct uid), count(distinct date) from order group by dealid; uid dealid date 1 1001 1101 2 1001 1101 2 1001 1102 Monday, 30 December,
  19. 19. Distinct select dealid, count(distinct uid), count(distinct date) from order group by dealid; key uid dealid date partition value Key 1001 1101 2 1001 1102 Monday, 30 December, 1001 <1001,1,1101> 1 1001 <1001,0,2> 1 1001 1 1001 1 1001 <1001,1,1102> 2 Map 1 <1001,0,2> 1001 1101 <1001,0,1> <1001,1,1101> 1 1 1001
  20. 20. Distinct select dealid, count(distinct uid), count(distinct date) from order group by dealid; key uid dealid date partition value Key 1001 1101 2 1001 1102 1001 <1001,1,1101> 1 1001 <1001,0,2> 1 1001 1 1001 1 1001 <1001,1,1102> 2 Map 1 <1001,0,2> 1001 1101 <1001,0,1> <1001,1,1101> 1 1 1001 只需要在Reduce阶段记录lastDealid, lastTag, lastuid, lastDate Monday, 30 December,
  21. 21. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  22. 22. Compile Workflow Parser Semantic Analyzer Logical Plan Gen Logical Optimizer Physical Plan Gen Physical Optimizer Monday, 30 December,
  23. 23. Compile Workflow Hive QL Parser AST Tree Semantic Analyzer QB Logical Plan Gen Operator Tree Logical Optimizer Operator Tree Physical Plan Gen Task TreePhysical Optimizer Monday, 30 December, Task Tree
  24. 24. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  25. 25. Antlr • • • Monday, 30 December, Antlr是⼀一种语⾔言识别的⼯工具 可以⽤用来构造领域语⾔言 只需要编写⼀一个语法⽂文件,定义词法和语法替换规则,Antlr完成了词 法分析、语法分析、语义分析、中间代码⽣生成等过程
  26. 26. AST Tree 如果需要对表达式做进⼀一步的处理,对表达式的运算结果求值,使⽤用 Antlr 可以有两种选择,第⼀一,直接在语法⽂文件中嵌⼊入动作,加⼊入代码⽚片 段;第⼆二,使⽤用 Antlr 的抽象语法树语法,在语法分析的同时将⽤用户输⼊入 转换成中间表⽰示⽅方式:抽象语法树,后续在遍历语法树的同时完成计算。 Monday, 30 December,
  27. 27. Example SQL Monday, 30 December,
  28. 28. Sub Query Parser Semantic Analyzer Logical Plan Gen. 15 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  29. 29. Sub Query 1 1 Parser Semantic Analyzer Logical Plan Gen. 15 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  30. 30. Sub Query 2 1 2 1 Parser Semantic Analyzer Logical Plan Gen. 15 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  31. 31. From => AST 1.1 Parser Semantic Analyzer Logical Plan Gen. 16 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  32. 32. From => AST 1.1 Parser Semantic Analyzer Logical Plan Gen. 17 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  33. 33. Select => AST 1.2 Parser Semantic Analyzer Logical Plan Gen. 18 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  34. 34. Select => AST 1.2 Parser Semantic Analyzer Logical Plan Gen. 19 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  35. 35. Where 1.3 Parser Semantic Analyzer Logical Plan Gen. 20 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  36. 36. Where => AST 1.3 Parser Semantic Analyzer Logical Plan Gen. 21 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  37. 37. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  38. 38. QueryBlock • QueryBlock : ⼀一条SQL的基本组成单元,包括三个部分:输⼊入源,计算 过程,输出。 • 从AST Tree⽣生成QueryBlock的过程,就是从抽象语法树中找出所有的基 本单元以及每个单元之间的关系的过程。每个基本单元创建⼀一个QB对 象,将每个基本单元的不同操作转化为QB对象的不同属性。 Parser Semantic Analyzer Logical Plan Gen. 23 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  39. 39. QueryBlock • QueryBlock : ⼀一条SQL的基本组成单元,包括三个部分:输⼊入源,计算 过程,输出。 • 从AST Tree⽣生成QueryBlock的过程,就是从抽象语法树中找出所有的基 本单元以及每个单元之间的关系的过程。每个基本单元创建⼀一个QB对 象,将每个基本单元的不同操作转化为QB对象的不同属性。 Parser Semantic Analyzer Logical Plan Gen. 23 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  40. 40. QuueryBlock Parser Semantic Analyzer Logical Plan Gen. 24 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  41. 41. QuueryBlock 表名和别名 的映射关系 Parser Semantic Analyzer Logical Plan Gen. 24 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  42. 42. QuueryBlock ⼦子查询 ⼦子查询 Parser Semantic Analyzer Logical Plan Gen. 24 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  43. 43. QuueryBlock QBExpr本意是表达QB的 关系,但是⺫⽬目前只实现 了Union Parser Semantic Analyzer Logical Plan Gen. 24 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  44. 44. QuueryBlock Join ASTTree Parser Semantic Analyzer Logical Plan Gen. 24 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  45. 45. QuueryBlock key=‘inclause-i’ value=ASTTree Parser Semantic Analyzer Logical Plan Gen. 24 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  46. 46. QuueryBlock 记录表的源数据 Parser Semantic Analyzer Logical Plan Gen. 25 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  47. 47. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  48. 48. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  49. 49. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  50. 50. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 Parser 1. Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, TOK_QUERY > 创建QB对象,循环递归⼦子节点 Logical Optimizer Physical Plan Gen. Physical Optimizer
  51. 51. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 Parser 1. 2. Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, TOK_QUERY > 创建QB对象,循环递归⼦子节点 TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast); Logical Optimizer Physical Plan Gen. Physical Optimizer
  52. 52. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 1. 2. 3. Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, TOK_QUERY > 创建QB对象,循环递归⼦子节点 TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast); TOK_INSERT > 循环递归⼦子节点 Logical Optimizer Physical Plan Gen. Physical Optimizer
  53. 53. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 1. 2. 3. 4. Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, TOK_QUERY > 创建QB对象,循环递归⼦子节点 TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast); TOK_INSERT > 循环递归⼦子节点 TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) Logical Optimizer Physical Plan Gen. Physical Optimizer
  54. 54. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 1. 2. 3. 4. 5. Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, TOK_QUERY > 创建QB对象,循环递归⼦子节点 TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast); TOK_INSERT > 循环递归⼦子节点 TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) TOK_SELECT > QBParseInfo#destToSelExpr.put(“insclause-i”, astnode); destToAggregationExprs.put(“insclause-i”, astnode); destToDistinctFuncExprs.put(“insclause-i”, astnode); Logical Optimizer Physical Plan Gen. Physical Optimizer
  55. 55. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 1. 2. 3. 4. 5. 6. Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, TOK_QUERY > 创建QB对象,循环递归⼦子节点 TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast); TOK_INSERT > 循环递归⼦子节点 TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) TOK_SELECT > QBParseInfo#destToSelExpr.put(“insclause-i”, astnode); destToAggregationExprs.put(“insclause-i”, astnode); destToDistinctFuncExprs.put(“insclause-i”, astnode); TOK_WHERE > QBParseInfo# destToWhereExpr.put(“insclause-i”, ast); Logical Optimizer Physical Plan Gen. Physical Optimizer
  56. 56. AST Tree => QB 先序遍历AST Tree SemanticAnalyze#doPhase1 1 2 1. 2. 3. 4. 5. 6. TOK_QUERY > 创建QB对象,循环递归⼦子节点 TOK_FROM > QB#aliasToTabs.put(alias, tabname); QB#aliases.put(alias, tabname); QBParseInfo#aliasToSrc.put(alias.toLowerCase(), ast); TOK_INSERT > 循环递归⼦子节点 TOK_DESTINATION > QBParseInfo#nameToDest.put(“insclause-i”, astnode) TOK_SELECT > QBParseInfo#destToSelExpr.put(“insclause-i”, astnode); destToAggregationExprs.put(“insclause-i”, astnode); destToDistinctFuncExprs.put(“insclause-i”, astnode); TOK_WHERE > QBParseInfo# destToWhereExpr.put(“insclause-i”, ast); QB1 QB2 Parser Semantic Analyzer Logical Plan Gen. 26 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  57. 57. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  58. 58. Operator • • 逻辑操作符,在Map阶段或者Reduce阶段完成单⼀一特定的功能。 • • Map/Reduce阶段都由⼀一个OperatorTree组成。 • 某些Operator是⼀一个终结操作符TerminalOperator,标⽰示Map/Reduce阶段的结 束。如FileSinkOperator将数据写⼊入⽂文件,标志当前阶段的结束。 • ReduceSinkOperator只可能出现在Map阶段,将Map端的字段组合序列化为 Reduce Key/value, Partition Key。 常⻅见的Operator如:TableScanOperator SelectOperator FilterOperator JoinOperator GroupByOperator ReduceSinkOperator 流式的计算过程。每⼀一个Operator计算完成⼀一⾏行数据之后将数据传递给 childOperator计算 Parser Semantic Analyzer Logical Plan Gen. 28 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  59. 59. Operator • • • Logical Plan Gen. 29 Monday, 30 December, Logical Optimizer Operator所有运⾏行时需要的参数均保存 在OperatorDesc中,OperatorDesc在提 交任务前序列化到hdfs上,在MR Task执 ⾏行前从hdfs读取并反序列化 • Semantic Analyzer Hive每⼀一⾏行数据经过⼀一个Operator处理 之后,会对字段重新编号,colExprMap 被LogicalOptimizer⽤用来回溯字段名 • Parser RowSchema表⽰示Operator的输出字段 Map阶段OperatorTree在hdfs上的位置在 Job.getConf(“hive.exec.plan”) + “/map.xml” Physical Plan Gen. InputObjInspector outputObjInspector解 析输⼊入和输出字段 Physical Optimizer
  60. 60. QB => Operator Tree 中序遍历QB SemanticAnalyzer#genPlan(QB qb) 1. 2. 3. 4. 5. 6. 7. SemanticAnalyzer#genPlan QB#aliasToSubq => 递归调⽤用genPlan() QB#aliasToTabs => TableScanOperator QBParseInfo#joinExpr => QBJoinTree => ReduceSinkOperator + JoinOperator QBParseInfo#destToWhereExpr => FilterOperator QBParseInfo#destToGroupby => ReduceSinkOperator + GroupByOperator QBParseInfo#destToOrderby => ReduceSinkOperator + ExtractOperator ... SemanticAnalyzer#genBodyPlan Parser Semantic Analyzer Logical Plan Gen. 30 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  61. 61. QB2 : aliasToTabs => TableScanOperator QB#aliasToTabs {du=dim.user, c=detail.usersequence_client, p=fact.orderpayment} TableScanOperator(“dim.user”) TS[0] TableScanOperator(“detail.usersequence_client”) TS[1] TableScanOperator(“fact.orderpayment”) TS[2] Parser Semantic Analyzer Logical Plan Gen. 31 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  62. 62. QBJoinTree Parser Semantic Analyzer Logical Plan Gen. 32 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  63. 63. QB2 : QBParseInfo#joinExpr => QBJoinTree 先序遍历joinExpr⽣生成QBJoinTree Parser Semantic Analyzer Logical Plan Gen. 33 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  64. 64. QB2 : QBParseInfo#joinExpr => QBJoinTree 1 先序遍历joinExpr⽣生成QBJoinTree p / c p QB2 Parser Semantic Analyzer Logical Plan Gen. 33 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  65. 65. QB2 : QBParseInfo#joinExpr => QBJoinTree 1 2 先序遍历joinExpr⽣生成QBJoinTree base / p du / c p p / c p QB1 QB2 Parser Semantic Analyzer Logical Plan Gen. 33 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  66. 66. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator Parser Semantic Analyzer Logical Plan Gen. 34 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  67. 67. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator base / p du / c p TS[c] TS[p] Parser Semantic Analyzer Logical Plan Gen. 34 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  68. 68. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator base / p du / c p TS[c] TS[p] | | RS[3] RS[4] TS[c] TS[p] Parser Semantic Analyzer Logical Plan Gen. 34 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  69. 69. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator base / p du / c p TS[c] TS[p] | | RS[3] RS[4] TS[c] TS[p] Parser Semantic Analyzer Logical Plan Gen. 34 Monday, 30 December, Logical Optimizer Physical Plan Gen. TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] Physical Optimizer
  70. 70. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator Parser Semantic Analyzer Logical Plan Gen. 35 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  71. 71. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator base / p du / c p TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] Parser Semantic Analyzer Logical Plan Gen. 35 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  72. 72. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator base / p du / c p TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] Parser Semantic Analyzer Logical Plan Gen. 35 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  73. 73. QB2 : QBJoinTree => RS + JOIN 前序遍历QBJoinTree TS=TableScanOperator RS=ReduceSinkOperator JOIN=JoinOperator base / p du / c p TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] Parser Semantic Analyzer Logical Plan Gen. 35 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8]
  74. 74. QB2 : genBodyPlan QBParseInfo#destToWhereExpr > FilterOperator FIL= FilterOperator SEL= SelectOperator Parser Semantic Analyzer Logical Plan Gen. 36 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  75. 75. QB2 : genBodyPlan QBParseInfo#destToWhereExpr > FilterOperator FIL= FilterOperator SEL= SelectOperator TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] Parser Semantic Analyzer Logical Plan Gen. 36 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  76. 76. QB2 : genBodyPlan QBParseInfo#destToWhereExpr > FilterOperator FIL= FilterOperator SEL= SelectOperator TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] Parser Semantic Analyzer Logical Plan Gen. 36 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  77. 77. QB2 : genBodyPlan QBParseInfo#destToWhereExpr > FilterOperator FIL= FilterOperator SEL= SelectOperator TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] Parser Semantic Analyzer Logical Plan Gen. 36 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10]
  78. 78. QB1 : genBodyPlan QBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperator GBY= GroupByOperator Parser Semantic Analyzer Logical Plan Gen. 37 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  79. 79. QB1 : genBodyPlan QBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperator GBY= GroupByOperator TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10] Parser Semantic Analyzer Logical Plan Gen. 37 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  80. 80. QB1 : genBodyPlan QBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperator GBY= GroupByOperator TS[c] TS[p] | | TS[c] TS[p] RS[3] RS[4] | | / RS[3] RS[4] JOIN[5] TS[du] / | | JOIN[5] TS[du] RS[6] RS[7] | | / RS[6] RS[7] JOIN[8] / | JOIN[8] FIL[9] | | FIL[9] SEL[10] | | SEL[10] SEL[11] | HashMode AGGR < GBY[12] Parser Semantic Analyzer Logical Plan Gen. 37 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  81. 81. QB1 : genBodyPlan QBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperator GBY= GroupByOperator TS[c] TS[p] | | TS[c] TS[p] RS[3] RS[4] | | / RS[3] RS[4] JOIN[5] TS[du] / | | JOIN[5] TS[du] RS[6] RS[7] | | / RS[6] RS[7] JOIN[8] / | JOIN[8] FIL[9] | | FIL[9] SEL[10] | | SEL[10] SEL[11] | HashMode AGGR < GBY[12] Parser Semantic Analyzer Logical Plan Gen. 37 Monday, 30 December, Logical Optimizer TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12] | RS[13] Physical Plan Gen. Physical Optimizer
  82. 82. QB1 : genBodyPlan QBParseInfo#destToGroupby > ReduceSinkOperator + GroupByOperator TS[c] TS[p] GBY= GroupByOperator TS[c] TS[p] | | TS[c] TS[p] RS[3] RS[4] | | / RS[3] RS[4] JOIN[5] TS[du] / | | JOIN[5] TS[du] RS[6] RS[7] | | / RS[6] RS[7] JOIN[8] / | JOIN[8] FIL[9] | | FIL[9] SEL[10] | | SEL[10] SEL[11] | HashMode AGGR < GBY[12] Parser Semantic Analyzer Logical Plan Gen. 37 Monday, 30 December, Logical Optimizer TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12] | RS[13] Physical Plan Gen. Physical Optimizer | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10] | SEL[11] | GBY[12] | RS[13] | GBY[14]
  83. 83. QB1 : genPostGroupByBodyPlan FS=FileSinkOperator SEL[11] | GBY[12] | RS[13] | GBY[14] | SEL[15] | SEL[16] | FS[17] TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10] QB2 Parser Semantic Analyzer QB1 Logical Plan Gen. 38 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  84. 84. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  85. 85. Logical Optimizer 变换OperatorTree 名称 作⽤用 2) PredicatePushDown 谓词前置 ColumnPruner 字段剪枝 2) GroupByOptimizer Map端聚合 1) ReduceSinkDeDuplication 合并线性的OperatorTree中partition/sort key 相同的reduce 1) CorrelationOptimizer 利⽤用查询中的相关性,合并有相关性的 Job,HIVE-2206 2) SimpleFetchOptimizer 优化没有GroupBy表达式的聚合查询 2) MapJoinProcessor MapJoin,提供hint 2) BucketMapJoinOptimizer BucketMapJoin Parser Semantic Analyzer Logical Plan Gen. 40 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  86. 86. Logical Optimizer 变换OperatorTree 名称 作⽤用 2) PredicatePushDown 谓词前置 ColumnPruner 字段剪枝 2) GroupByOptimizer Map端聚合 1) ReduceSinkDeDuplication 合并线性的OperatorTree中partition/sort key 相同的reduce 1) CorrelationOptimizer 利⽤用查询中的相关性,合并有相关性的 Job,HIVE-2206 2) SimpleFetchOptimizer 优化没有GroupBy表达式的聚合查询 2) MapJoinProcessor MapJoin,提供hint 2) BucketMapJoinOptimizer BucketMapJoin 1) ⼀一个Job干尽可能多的事情/合并Job Parser Semantic Analyzer Logical Plan Gen. 40 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  87. 87. Logical Optimizer 变换OperatorTree 名称 作⽤用 2) PredicatePushDown 谓词前置 ColumnPruner 字段剪枝 2) GroupByOptimizer Map端聚合 1) ReduceSinkDeDuplication 合并线性的OperatorTree中partition/sort key 相同的reduce 1) CorrelationOptimizer 利⽤用查询中的相关性,合并有相关性的 Job,HIVE-2206 2) SimpleFetchOptimizer 优化没有GroupBy表达式的聚合查询 2) MapJoinProcessor MapJoin,提供hint 2) BucketMapJoinOptimizer BucketMapJoin 1) ⼀一个Job干尽可能多的事情/合并Job 2) 减少shuffle数据量,甚⾄至不做Reduce Parser Semantic Analyzer Logical Plan Gen. 40 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  88. 88. PredicatePushDown 断⾔言判断提前 TS[c] TS[p] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | FIL[9] | SEL[10] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] QB2 Parser Semantic Analyzer Logical Plan Gen. 41 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  89. 89. NonBlockingOpDeDupProc 合并SEL-SEL 或者 FIL-FIL 为⼀一个Operator SEL[11] | GBY[12] | RS[13] | GBY[14] | SEL[15] | SEL[16] | FS[17] GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] QB1 Parser Semantic Analyzer Logical Plan Gen. 42 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  90. 90. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  91. 91. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; TS | SEL | GBY | RS | GBY | SEL | GBY | FS TS | RS | GBY | SEL | FS Stage-1 Stage-2 Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  92. 92. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; TS | SEL | GBY | RS | GBY | SEL | GBY | FS TS | RS | GBY | SEL | FS key partition Key pRS key,value key,value cRS key key Stage-1 Stage-2 Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  93. 93. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; TS | SEL | GBY | RS | GBY | SEL | GBY | FS TS | RS | GBY | SEL | FS key partition Key pRS key,value key,value cRS key key pRS key完全包含cRS key,且排序顺序⼀一致 pRS partitionkey完全包含cRS partitionkey Stage-1 Stage-2 Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  94. 94. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; TS | SEL | GBY | RS | GBY | SEL | GBY | FS TS | RS | GBY | SEL | FS key partition Key pRS key,value key,value cRS key key pRS key完全包含cRS key,且排序顺序⼀一致 pRS partitionkey完全包含cRS partitionkey Stage-1 Stage-2 Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS | SEL | GBY | RS | GBY | SEL | GBY | FS
  95. 95. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; TS | SEL | GBY | RS | GBY | SEL | GBY | FS TS | RS | GBY | SEL | FS key partition Key pRS key,value key,value cRS key key pRS key完全包含cRS key,且排序顺序⼀一致 pRS partitionkey完全包含cRS partitionkey Stage-1 Stage-2 Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS | SEL | GBY | RS | GBY | SEL | GBY | FS key : key, value partitionkey : key
  96. 96. ReduceSinkDeDuplication 合并线性的相连的两个RS from (select key, value from src group by key, value) s select s.key group by s.key; TS | SEL | GBY | RS | GBY | SEL | GBY | FS TS | RS | GBY | SEL | FS key partition Key pRS key,value key,value cRS key key pRS key完全包含cRS key,且排序顺序⼀一致 pRS partitionkey完全包含cRS partitionkey Stage-1 Stage-2 Parser Semantic Analyzer Logical Plan Gen. 43 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS | SEL | GBY | key : key, value RS partitionkey : key | GBY | SEL 两个Job的numReduce | 数⺫⽬目是否⼀一致 GBY | FS
  97. 97. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  98. 98. MapReduceCompiler • • • • • • Parser 对输出表⽣生成MoveTask 从OperatorTree的其中⼀一个根节点向下深度优先遍历 ReduceSinkOperator标⽰示Map/Reduce的界限,多个Job间的界限 遍历其他根节点,遇过碰到JoinOperator合并MapReduceTask ⽣生成StatTask更新元数据 剪断Map与Reduce间的Operator Semantic Analyzer Logical Plan Gen. 45 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  99. 99. R0 gen MoveTask & Fetch Task GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] MapredLockWork[Stage-0] Stage-0 Move Operator QB1 Parser Semantic Analyzer Logical Plan Gen. 46 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  100. 100. Begin Walk TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] toWalk[] {TS[c], TS[du], TS[p]} QB2 Parser Semantic Analyzer Logical Plan Gen. 47 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  101. 101. Begin Walk TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] toWalk[] {TS[c], TS[du], TS[p]} opStack {} QB2 Parser Semantic Analyzer Logical Plan Gen. 48 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  102. 102. Begin Walk TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] toWalk[] {TS[c], TS[du]} opStack {TS[p]} QB2 Parser Semantic Analyzer Logical Plan Gen. 49 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  103. 103. R1 GenMRTableScan1 toWalk[] {TS[du], TS[c]} opStack {TS[p]} Parser Semantic Analyzer Logical Plan Gen. 50 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  104. 104. R1 GenMRTableScan1 toWalk[] {TS[du], TS[c]} opStack {TS[p]} "".join([t + "%" for t in opStack]) == “ TS%” Parser Semantic Analyzer Logical Plan Gen. 50 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  105. 105. R1 GenMRTableScan1 toWalk[] {TS[du], TS[c]} opStack {TS[p]} "".join([t + "%" for t in opStack]) == “ TS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] QB2 Parser Semantic Analyzer Logical Plan Gen. 50 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  106. 106. R1 GenMRTableScan1 toWalk[] {TS[du], TS[c]} opStack {TS[p]} "".join([t + "%" for t in opStack]) == “ TS%” TS[p] Stage-1 MapRedTask | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] QB2 Parser Semantic Analyzer Logical Plan Gen. 50 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  107. 107. R2 GenMRRedSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]} Parser Semantic Analyzer Logical Plan Gen. 51 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  108. 108. R2 GenMRRedSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]} "".join([t + "%" for t in opStack]) == “TS%.*RS%” Parser Semantic Analyzer Logical Plan Gen. 51 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  109. 109. R2 GenMRRedSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]} "".join([t + "%" for t in opStack]) == “TS%.*RS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 MapTask Parser Semantic Analyzer Logical Plan Gen. 51 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  110. 110. R2 GenMRRedSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4]} "".join([t + "%" for t in opStack]) == “TS%.*RS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] TS[p] Stage-1 MapTask | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | Stage-1 ReduceTask SEL[10] Stage-1 MapTask Parser Semantic Analyzer Logical Plan Gen. 51 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  111. 111. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]} Parser Semantic Analyzer Logical Plan Gen. 52 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  112. 112. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” Parser Semantic Analyzer Logical Plan Gen. 52 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  113. 113. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 MapTask Stage-1 ReduceTask Parser Semantic Analyzer Logical Plan Gen. 52 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  114. 114. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 MapTask TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 Stage-1 ReduceTask Parser Semantic Analyzer Stage-2 Logical Plan Gen. 52 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  115. 115. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 MapTask TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 Stage-1 ReduceTask Parser Semantic Analyzer splitPlan Stage-2 Logical Plan Gen. 52 Monday, 30 December, MR[Stage-1] MR[Stage-2] Logical Optimizer Physical Plan Gen. Physical Optimizer TS[p] | FIL[18] | RS[4] / JOIN[5] | FS[19] TS[20] | RS[6] JOIN[8] | SEL[10]
  116. 116. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 MapTask TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] Stage-1 Stage-1 ReduceTask Parser Semantic Analyzer splitPlan Logical Plan Gen. Logical Optimizer Physical Plan Gen. TS[p] | FIL[18] | RS[4] / JOIN[5] | FS[19] TS[20] | RS[6] JOIN[8] | SEL[10] 中间数据落地, 存储在hdfs临时⽂文 件中 Stage-2 52 Monday, 30 December, MR[Stage-1] MR[Stage-2] Physical Optimizer
  117. 117. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]} Stage-3 Parser Semantic Analyzer Logical Plan Gen. 53 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  118. 118. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” Stage-3 Parser Semantic Analyzer Logical Plan Gen. 53 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  119. 119. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[20] | RS[6] JOIN[8] | SEL[10] | GBY[12] | RS[13] Stage-2 | GBY[14] | SEL[15] | FS[17] Parser Stage-3 Semantic Analyzer Logical Plan Gen. 53 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  120. 120. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[20] | RS[6] JOIN[8] | SEL[10] | GBY[12] | RS[13] Stage-2 | GBY[14] | SEL[15] | FS[17] Parser TS[20] Stage-2 | RS[6] JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | Stage-3 FS[17] Semantic Analyzer Logical Plan Gen. 53 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  121. 121. R3 GenMRRedSink2 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13]} "".join([t + "%" for t in opStack]) == “RS%.*RS%” TS[20] | RS[6] JOIN[8] | SEL[10] | GBY[12] | RS[13] Stage-2 | GBY[14] | SEL[15] | FS[17] Parser TS[20] Stage-2 | RS[6] JOIN[8] | splitPlan SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | Stage-3 FS[17] Semantic Analyzer Logical Plan Gen. 53 Monday, 30 December, Logical Optimizer Physical Plan Gen. MR[Stage-2] MR[Stage-3] TS[20] | RS[6] JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[22] | RS[13] | GBY[14] | SEL[15] | FS[17] Physical Optimizer
  122. 122. R4 GenMRFileSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13], GBY[14], SEL[15], FS[17]} "".join([t + "%" for t in opStack]) == “FS%” Parser Semantic Analyzer Logical Plan Gen. 54 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  123. 123. R4 GenMRFileSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13], GBY[14], SEL[15], FS[17]} "".join([t + "%" for t in opStack]) == “FS%” MR[Stage-1] | MR[Stage-2] | MR[Stage-3] Parser MoveWork[Stage-0] Semantic Analyzer Logical Plan Gen. 54 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  124. 124. R4 GenMRFileSink1 toWalk[] {TS[du], TS[c]} opStack {TS[p], FIL[18], RS[4], JOIN[5], RS[6], JOIN[8], SEL[10], GBY[12], RS[13], GBY[14], SEL[15], FS[17]} "".join([t + "%" for t in opStack]) == “FS%” MR[Stage-1] | MR[Stage-2] | MR[Stage-3] Parser MoveWork[Stage-0] Semantic Analyzer Logical Plan Gen. 54 Monday, 30 December, MR[Stage-1] | MR[Stage-2] | MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4] Logical Optimizer Physical Plan Gen. Physical Optimizer
  125. 125. Begin Walk TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer opStack.clear() Logical Plan Gen. 55 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  126. 126. Begin Walk TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer toWalk[] {TS[c], TS[du]} opStack {} Logical Plan Gen. 56 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  127. 127. R1 GenMRTableScan1 toWalk[] {TS[c]} opStack {TS[du]} "".join([t + "%" for t in opStack]) == “ TS%” Parser Semantic Analyzer Logical Plan Gen. 57 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  128. 128. R1 GenMRTableScan1 toWalk[] {TS[c]} opStack {TS[du]} "".join([t + "%" for t in opStack]) == “ TS%” TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 57 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  129. 129. R1 GenMRTableScan1 toWalk[] {TS[c]} opStack {TS[du]} "".join([t + "%" for t in opStack]) == “ TS%” TS[du] Stage-5 MapTask | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 57 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  130. 130. R2 GenMRRedSink1 toWalk[] {TS[c]} opStack {TS[du], RS[7]} "".join([t + "%" for t in opStack]) == “ TS%.*RS%” Parser Semantic Analyzer Logical Plan Gen. 58 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  131. 131. R2 GenMRRedSink1 toWalk[] {TS[c]} opStack {TS[du], RS[7]} "".join([t + "%" for t in opStack]) == “ TS%.*RS%” TS[du] Stage-5 MapTask | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 58 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  132. 132. R2 GenMRRedSink1 toWalk[] {TS[c]} opStack {TS[du], RS[7]} "".join([t + "%" for t in opStack]) == “ TS%.*RS%” Stage-5 MapTask TS[du] Stage-5 MapTask | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] Stage-5 ReduceTask Parser Semantic Analyzer Logical Plan Gen. 58 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  133. 133. R2 GenMRRedSink1 toWalk[] {TS[c]} opStack {TS[du], RS[7]} "".join([t + "%" for t in opStack]) == “ TS%.*RS%” Stage-5 MapTask TS[du] Stage-5 MapTask | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] MR[Stage-2] + TS[20] | RS[6] JOIN[8] | SEL[10] Stage-5 ReduceTask Parser Semantic Analyzer Logical Plan Gen. 58 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  134. 134. R2 GenMRRedSink1 toWalk[] {TS[c]} opStack {TS[du], RS[7]} "".join([t + "%" for t in opStack]) == “ TS%.*RS%” Stage-5 MapTask TS[du] Stage-5 MapTask | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[du] | RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] MR[Stage-2] MR[Stage-2] + TS[20] | RS[6] JOIN[8] | SEL[10] merge map work Stage-5 ReduceTask Parser Semantic Analyzer Logical Plan Gen. 58 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS[20] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21]
  135. 135. Begin Walk TS[c] | RS[3] JOIN[5] | FS[19] Parser opStack.clear() Semantic Analyzer Logical Plan Gen. 59 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  136. 136. Begin Walk TS[c] | RS[3] JOIN[5] | FS[19] Parser toWalk[] {TS[c]} opStack {} Semantic Analyzer Logical Plan Gen. 60 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  137. 137. R1 GenMRTableScan1 toWalk[] {} opStack {TS[c]} "".join([t + "%" for t in opStack]) == “ TS%” Stage-6 MapRedTask TS[c] | RS[3] JOIN[5] | FS[19] Parser Semantic Analyzer TS[c] | RS[3] JOIN[5] | FS[19] Logical Plan Gen. 61 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  138. 138. R2 GenMRRedSink1 toWalk[] {} opStack {TS[c], RS[3]} "".join([t + "%" for t in opStack]) == “ TS%.*RS%” Stage-6 MapRedTask MR[Stage-1] Stage-6 MapWork TS[c] | RS[3] JOIN[5] | FS[19] TS[c] | RS[3] JOIN[5] | FS[19] + TS[p] | FIL[18] | RS[4] / JOIN[5] | FS[19] MR[Stage-1] merge map work Stage-6 RedWork Parser Semantic Analyzer Logical Plan Gen. 62 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] | FS[19]
  139. 139. breakTaskTree MR[Stage-1] MR[Stage-2] MR[Stage-3] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] | FS[19] TS[20] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[22] | RS[13] | GBY[14] | SEL[15] | FS[17] Parser Semantic Analyzer Logical Plan Gen. 63 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  140. 140. breakTaskTree MR[Stage-1] MR[Stage-2] MR[Stage-3] MR[Stage-1] MR[Stage-2] MR[Stage-3] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] | FS[19] TS[20] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[22] | RS[13] | GBY[14] | SEL[15] | FS[17] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] TS[20] TS[du] | | RS[6] RS[7] TS[22] | RS[13] Parser Semantic Analyzer Logical Plan Gen. 63 Monday, 30 December, JOIN[5] | FS[19] Logical Optimizer Physical Plan Gen. Physical Optimizer JOIN[8] | SEL[10] | GBY[12] | FS[21] GBY[14] | SEL[15] | FS[17]
  141. 141. breakTaskTree MR[Stage-1] MR[Stage-2] MR[Stage-3] MR[Stage-1] MR[Stage-2] MR[Stage-3] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] | FS[19] TS[20] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | FS[21] TS[22] | RS[13] | GBY[14] | SEL[15] | FS[17] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] TS[20] TS[du] | | RS[6] RS[7] TS[22] | RS[13] Parser Semantic Analyzer Logical Plan Gen. 63 Monday, 30 December, JOIN[5] | FS[19] Logical Optimizer Physical Plan Gen. Physical Optimizer JOIN[8] | SEL[10] | GBY[12] | FS[21] GBY[14] | SEL[15] | FS[17] map reduce
  142. 142. Logical Plan => Physical Plan TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] Parser Semantic Analyzer Logical Plan Gen. 64 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  143. 143. Logical Plan => Physical Plan TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] Parser Semantic Analyzer MR[Stage-1] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] JOIN[5] | FS[19] Logical Plan Gen. 64 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  144. 144. Logical Plan => Physical Plan TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] Parser Semantic Analyzer MR[Stage-1] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] TS[20] TS[du] | | RS[6] RS[7] JOIN[5] | FS[19] Logical Plan Gen. 64 Monday, 30 December, MR[Stage-2] Logical Optimizer JOIN[8] | SEL[10] | GBY[12] | FS[21] Physical Plan Gen. Physical Optimizer
  145. 145. Logical Plan => Physical Plan TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] Parser Semantic Analyzer MR[Stage-1] MR[Stage-3] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] TS[20] TS[du] | | RS[6] RS[7] TS[22] | RS[13] JOIN[5] | FS[19] Logical Plan Gen. 64 Monday, 30 December, MR[Stage-2] Logical Optimizer JOIN[8] | SEL[10] | GBY[12] | FS[21] Physical Plan Gen. GBY[14] | SEL[15] | FS[17] Physical Optimizer
  146. 146. Logical Plan => Physical Plan TS[p] | TS[c] FIL[18] | | RS[3] RS[4] / JOIN[5] TS[du] | | RS[6] RS[7] / JOIN[8] | SEL[10] | GBY[12] | RS[13] | GBY[14] | SEL[15] | FS[17] Parser Semantic Analyzer MR[Stage-1] MR[Stage-3] TS[p] | TS[c] FIL[18] | | RS[3] RS[4] TS[20] TS[du] | | RS[6] RS[7] TS[22] | RS[13] JOIN[5] | FS[19] Logical Plan Gen. 64 Monday, 30 December, MR[Stage-2] Logical Optimizer JOIN[8] | SEL[10] | GBY[12] | FS[21] Physical Plan Gen. GBY[14] | SEL[15] | FS[17] Physical Optimizer MR[Stage-1] JOIN[5] | MR[Stage-2] JOIN[8] GBY[12] | MR[Stage-3] GBY[14] | MoveWork[Stage-0] | StatsWork[Stage-4]
  147. 147. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 3. Monday, 30 December, Hive执⾏行计划
  148. 148. Physical Optimizer 名称 作⽤用 CommonJoinResolver + MapJoinResolver MapJoin SortMergeJoinResolver 与bucket配合,类似于归并排序 SamplingOptimizer 并⾏行 order by Vectorizer HIVE-4160 Parser Semantic Analyzer Logical Plan Gen. 66 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  149. 149. MapJoin MapReduce Local Task Parser Semantic Analyzer Logical Plan Gen. 67 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  150. 150. MapJoin Small Small Small Table Table Table Data Data Data MapReduce Local Task Parser Semantic Analyzer Logical Plan Gen. 67 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  151. 151. MapJoin Small Small Small Table Table Table Data Data Data MapReduce Local Task HashTable HashTable HashTable Files Files Files Upload files to DC Distributed Cache Parser Semantic Analyzer Logical Plan Gen. 67 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  152. 152. MapJoin Small Small Small Table Table Table Data Data Data MapReduce Local Task HashTable HashTable HashTable Files Files Files Upload files to DC Distributed Cache MapJoin Task Mapper Mapper … Mapper Parser … … Semantic Analyzer Logical Plan Gen. 67 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  153. 153. MapJoin Small Small Small Table Table Table Data Data Data MapReduce Local Task HashTable HashTable HashTable Files Files Files Upload files to DC Distributed Cache MapJoin Task Mapper Mapper … Mapper Parser … … Semantic Analyzer Logical Plan Gen. 67 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  154. 154. MapJoin Small Small Small Table Table Table Data Data Data MapReduce Local Task HashTable HashTable HashTable Files Files Files Upload files to DC Distributed Cache MapJoin Task Mapper … Mapper … Record Mapper … Record Record Record … … Parser Semantic Analyzer Logical Plan Gen. 67 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer Big Table Data
  155. 155. CommonJoinResolver Task A Task C Parser Semantic Analyzer Logical Plan Gen. 68 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  156. 156. CommonJoinResolver Task A Conditional Task Task C Parser Semantic Analyzer Logical Plan Gen. 68 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  157. 157. CommonJoinResolver Task A Conditional Task MapJoin LocalTask MapJoinTas k Task C Parser Semantic Analyzer Logical Plan Gen. 68 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  158. 158. CommonJoinResolver Task A Conditional Task Memory Bound MapJoin LocalTask MapJoinTas k Task C Parser Semantic Analyzer Logical Plan Gen. 68 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  159. 159. CommonJoinResolver Task A Conditional Task Memory Bound MapJoin LocalTask MapJoinTas k Task C Parser Semantic Analyzer Logical Plan Gen. 68 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  160. 160. CommonJoinResolver Task A Conditional Task Memory Bound Run as a Backup Task MapJoin LocalTask CommonJoinTas k MapJoinTas k Task C Parser Semantic Analyzer Logical Plan Gen. 68 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  161. 161. CommonJoinResolver • • • • MR[Stage-1] JOIN[5] | MR[Stage-2] JOIN[8] GBY[12] | MR[Stage-3] GBY[14] | MoveWork[Stage-0] | StatsWork[Stage-4] Parser Semantic Analyzer Logical Plan Gen. 69 Monday, 30 December, 深度优先遍历Task Tree 找到JoinOperator,判断左右表数据量⼤大⼩小 ⼩小表 + ⼤大表 => MapJoinTask ⼩小/⼤大表 + 中间表 => ConditionalTask Logical Optimizer Physical Plan Gen. Physical Optimizer
  162. 162. CommonJoinResolver MR[Stage-2] TS[20] TS[du] | | RS[6] RS[7] JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 70 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  163. 163. CommonJoinResolver MR[Stage-2] big table TS[20] TS[du] | | RS[6] RS[7] JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 70 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  164. 164. CommonJoinResolver MR[Stage-2] big table TS[20] TS[du] | | RS[6] RS[7] deepCopy JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser TS[23] TS[25] | | RS[24] RS[26] JOIN[34] | SEL[35] | GBY[36] | FS[37] Semantic Analyzer Logical Plan Gen. 70 Monday, 30 December, MR[Stage-7] Logical Optimizer Physical Plan Gen. Physical Optimizer
  165. 165. CommonJoinResolver MR[Stage-2] big table TS[20] TS[du] | | RS[6] RS[7] deepCopy JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser TS[23] TS[25] | | RS[24] RS[26] TS[23] TS[25] Map Only MR / MAPJOIN[44] | SEL[35] | GBY[36] | FS[37] JOIN[34] | SEL[35] | GBY[36] | FS[37] Semantic Analyzer Logical Plan Gen. 70 Monday, 30 December, MRTask[Stage-7] FetchWork[$INTNAME] LocalWork MR[Stage-7] Logical Optimizer Physical Plan Gen. Physical Optimizer
  166. 166. CommonJoinResolver MR[Stage-2] TS[20] TS[du] | | RS[6] RS[7] JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 71 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  167. 167. CommonJoinResolver MR[Stage-2] TS[20] TS[du] | | RS[6] RS[7] big table JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 71 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  168. 168. CommonJoinResolver MR[Stage-2] big table TS[20] TS[du] | | RS[6] RS[7] deepCopy ... JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 71 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  169. 169. CommonJoinResolver MRTask[Stage-8] FetchWork[du] LocalWork MR[Stage-2] big table TS[20] TS[du] | | RS[6] RS[7] deepCopy ... JOIN[8] | SEL[10] | GBY[12] | FS[21] Parser Semantic Analyzer Logical Plan Gen. 71 Monday, 30 December, TS[45] TS[47] / MAPJOIN[66] | SEL[57] | GBY[36] | FS[37] Logical Optimizer Physical Plan Gen. Physical Optimizer Map Only MR
  170. 170. CommonJoinResolver MR[Stage-1] JOIN[5] | MR[Stage-2] JOIN[8] GBY[12] | MR[Stage-3] GBY[14] | MoveWork[Stage-0] | StatsWork[Stage-4] Parser Semantic Analyzer Logical Plan Gen. 72 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  171. 171. CommonJoinResolver MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | MR[Stage-7] MR[Stage-8] MR[Stage-2] MAPJOIN MAPJOIN JOIN | / | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4] MR[Stage-1] JOIN[5] | MR[Stage-2] JOIN[8] GBY[12] | MR[Stage-3] GBY[14] | MoveWork[Stage-0] | StatsWork[Stage-4] Parser Semantic Analyzer Logical Plan Gen. 72 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  172. 172. CommonJoinResolver MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | MR[Stage-7] MR[Stage-8] MR[Stage-2] 运⾏行时判断, MAPJOIN MAPJOIN JOIN 采⽤用哪种⽅方式执⾏行 | / | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4] MR[Stage-1] JOIN[5] | MR[Stage-2] JOIN[8] GBY[12] | MR[Stage-3] GBY[14] | MoveWork[Stage-0] | StatsWork[Stage-4] Parser Semantic Analyzer Logical Plan Gen. 72 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  173. 173. MapJoinResolver • 遍历Task Tree,将所有有local work的MapReduceTask拆 成两个Task MRTask[Stage-13] FetchWork[c] HashTableSinkOperator | MRTask[Stage-10] MRWork MRTask[Stage-10] FetchWork[c] MRWork Parser Semantic Analyzer Logical Plan Gen. 73 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  174. 174. MapJoinResolver Lock[Stage-13] | MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | Lock[Stage-11] Lock[Stage-12] | | | MR[Stage-7] MR[Stage-8] MR[Stage-2] MAPJOIN MAPJOIN JOIN | / | / | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4] MR[Stage-10] MAPJOIN | ConditionalTask[Stage-9] / | MR[Stage-7] MR[Stage-8] MR[Stage-2] MAPJOIN MAPJOIN JOIN | / | / MR[Stage-3] | MoveWork[Stage-0] | StatsWork[Stage-4] Parser Semantic Analyzer Logical Plan Gen. 74 Monday, 30 December, Logical Optimizer Physical Plan Gen. Physical Optimizer
  175. 175. 回顾 sql翻译的过程 Monday, 30 December,
  176. 176. 回顾 sql翻译的过程 1. Monday, 30 December, Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语 法树AST Tree
  177. 177. 回顾 sql翻译的过程 1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语 法树AST Tree 2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock Monday, 30 December,
  178. 178. 回顾 sql翻译的过程 1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语 法树AST Tree 2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock 3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree Monday, 30 December,
  179. 179. 回顾 sql翻译的过程 1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语 法树AST Tree 2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock 3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree 4. 逻辑优化器进⾏行OperatorTree变换,合并ReduceSink,减少shuffle数据量 Monday, 30 December,
  180. 180. 回顾 sql翻译的过程 1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语 法树AST Tree 2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock 3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree 4. 逻辑优化器进⾏行OperatorTree变换,合并ReduceSink,减少shuffle数据量 5. 遍历OperatorTree,翻译为MapReduce任务 Monday, 30 December,
  181. 181. 回顾 sql翻译的过程 1. Antlr定义sql的语法规则,完成sql词法,语法解析,将sql转化为抽象语 法树AST Tree 2. 遍历AST Tree,抽象出查询的基本组成单元QueryBlock 3. 遍历QueryBlock,翻译为执⾏行逻辑OperatorTree 4. 逻辑优化器进⾏行OperatorTree变换,合并ReduceSink,减少shuffle数据量 5. 遍历OperatorTree,翻译为MapReduce任务 6. 物理层优化器进⾏行MapReduce任务的变换,⽣生成Conditional Task,动态 检测是否能转化MapJoin Monday, 30 December,
  182. 182. ⺫⽬目录 1. MapReduce实现Join Group By Distinct操作的基本原理 2. SQL转化为MapReduce的过程 (1) Antlr && ASTTree (2) sql基本组成单元QueryBlock (3) 逻辑操作符Operator (4) 逻辑层优化器 (5) OperatorTree转化为MapReduce Job的过程 (6) 物理层优化器 MapJoin原理 3. Monday, 30 December, Hive执⾏行计划
  183. 183. 执⾏行计划 • AST抽象语法树 • Stage Dependency • MapReduce Plan Monday, 30 December,
  184. 184. Stage Dependency Stage-11 depends on stages: Stage-14 , consists of Stage-15, Stage-16, Stage-4 Stage-11是⼀一个ConditionalTask,可能执⾏行Stage-15/Stage-16/Stage-4中的 ⼀一个。⺫⽬目前出现ConditionalTask只可能是在执⾏行期间判断是否能转化为 MapJoin的情况。Stage-4 common join,Stage-15和Stage-16就是可能的两 种MapJoin的情况。 Monday, 30 December,
  185. 185. Stage Dependency Stage-11 depends on stages: Stage-14 , consists of Stage-15, Stage-16, Stage-4 Stage-11是⼀一个ConditionalTask,可能执⾏行Stage-15/Stage-16/Stage-4中的 ⼀一个。⺫⽬目前出现ConditionalTask只可能是在执⾏行期间判断是否能转化为 MapJoin的情况。Stage-4 common join,Stage-15和Stage-16就是可能的两 种MapJoin的情况。 Monday, 30 December,
  186. 186. MapReduce Plan • • • • • Monday, 30 December, ReduceSinkOperator只可能出现在Map阶段,且标志着Map阶段 组合字段为reduce key, value sort order 按id正排,按name正排 partition key 按partitionkey求hash值分配reduce tag,标⽰示表,在Join中区分是哪个原始表
  187. 187. MapReduce Plan • 每个Operator计算完成之后均会对字段重新命名,命名⽅方式_col + i,Map 输出字段以KEY/VALUE._col + i形式表⽰示 • • KEY._col1:0._col0 “0.”表⽰示给distinct字段打上标签 Monday, 30 December, mode,聚合计算⽅方式,COMPLETE, PARTIAL1, PARTIAL2, PARTIALS, FINAL, HASH, MERGEPARTIAL
  188. 188. MapReduce Plan • • Monday, 30 December, condition expression表⽰示join中两表分别包含的字段 Position of Big Table 表⽰示tag=1的表是数据量⼤大的表
  189. 189. Monday, 30 December,
  190. 190. Thanks && QA Monday, 30 December,

×