3. Hive Architecture / Exec Flow
6/13/16 HIVE - A warehouse solution over Map Reduce Framework 3
Driver
Compiler
Hadoop
Client
Metastore
-This is the overview
- Clients are User Interfaces both CLI,
WebUI and API likes JDBC and ODBC.
- Metastore is system catalog which
has the schema informaction for hive
tables.
- Dirver manages the lifecycle of
HiveQL for compilation, optimization
and execution.
- Complier transforms HiveQL to
Operators using some optimizers.
Hive Workflow:
- Hive has the operators which are minimum processing units.
- The process of each operator is done with HDFS operation or M/R jobs.
- The compiler converts HiveQL to the sets of operators
- The point is : Hive converts our order(HiveQL) to operators which are made
with M/R jobs.
5. • For M/R processing, Hive uses ExecMaper and ExecReducer
• Hive’s M/R jobs are done by ExecMaper and ExecReducer
• They read plans and process them dynamically
• On processing, 2 modes
• Local processing mode
• Distributed processing mode
Hive Workflow
Driver
Compiler
Hadoop
Client
Metastore
6. Hive Workflow – 2 modes
• Local Mode
• Hive fork the process with hadoop command
• The plan.xml is made just on 1 and the single node process this
• Distributed mode
• Hive send the process to existing JobTracker
• The information is housed on DistributedCache and
• Processed on multi-nodes
Driver
Compiler
Hadoop
Client
Metastore
7. Hive Workflow - Compiler
• Compiler: How to process HiveQL
Driver
Compiler
Hadoop
Client
Metastore
8. “Plumbing”of HIVE compiler
Parser
• Convert into Parse Tree Representation
Semantic
Analyzer
• Convert into block-base internal query representation
• retrieve schema information of the input table from metastore and verifies the column
names and so on
Logical Plan
Generator
• Convert into internal query representation to a logic plan consists of a tree of logical
operators
9. “Plumbing”of HIVE compiler – continued
Logical
Optimizer
• Rewrite plans into more optimized plans
• Logical optimizer perform multiple passes over logical plan and rewrites in several ways. For example, Combine
multiple joins which share the join key into a single multi-way JOIN which is done by a single M/R job.
Physical Plan
Generator
•Convert into physical plans(M/R jobs)
Physical
Optimizer
• Adopt join strategy
16. 16
Logical Plan Generator (1/4)
QB
OP
Tree
TableScanOperator(“access_log_hbase”)
TableScanOperator(“product_hbase”)
MetaData
Alias To Table Info
“a”=Table Info(“access_log_hbase”)
“p”=Table Info(“product_hbase”)
1616
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
QB
OP
Tree
17. 17
Logical Plan Generator (2/4)
QB
ParseInfo
+ TOK_JOIN
+ TOK_TABREF
+ TOK_TABNAME
+ "access_log_hbase"
+ a
+ TOK_TABREF
+ TOK_TABNAME
+ "product_hbase"
+ "p"
+ "="
+ "."
+ TOK_TABLE_OR_COL
+ "a"
+ "access_log_hbase"
+ "."
+ TOK_TABLE_OR_COL
+ "p"
+ "prono“
OP
Tree
ReduceSinkOperator(“access_log_hbase”)
ReduceSinkOperator(“product_hbase”)
JoinOperator
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
QB
OP
Tree
18. 18
Logical Plan Generator (3/4)
OP
Tree
SelectOperator
QB
ParseInfo
Name To Select Node
+ TOK_SELECT
+ TOK_SELEXPR
+ "."
+ TOK_TABLE_OR_COL
+ "a"
+ "user"
+ TOK_SELEXPR
+ "."
+ TOK_TABLE_OR_COL
+ "a"
+ "prono"
+ TOK_SELEXPR
+ "."
+ TOK_TABLE_OR_COL
+ "p"
+ "maker"
+ TOK_SELEXPR
+ "."
+ TOK_TABLE_OR_COL
+ "p"
+ "price"
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
QB
OP
Tree
19. 19
Logical Plan Generator (4/4)
OP
Tree
FileSinkOperator
QB
MetaData
Name To Destination Table Info
“insclause-0”=
Table Info(“access_log_temp2”)
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
QB
OP
Tree
20. Logical Plan Generator (result)
20 LCF
TableScanOperator
TS_1
TableScanOperator
TS_0
ReduceSinkOperator
RS_2
ReduceSinkOperator
RS_3
JoinOperator
JOIN_4
SelectOperator
SEL_5
FileSinkOperator
FS_6
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
OP
Tree
21. 21
Logical Optimizer
说明
LineageGenerator 表与表的血缘关系生成器
ColumnPruner 列裁剪
Predicate
PushDown
谓词下推,将只与一张表有关的过滤操
作下推至TableScanOperator之后
PartitionPruner 分区裁剪
PartitionCondition
Remover
在分区裁剪之前,将一些无关的条件谓
词去除
SimpleFetchOptimiz
er
优化没有GroupBy表达式的聚合查询
GroupByOptimizer map端聚合
CorrelationOptimize
r
利用查询中的相关性,合并有相关性的
JOB
说明
GroupByOptimizer Group By 优化
SamplePruner 采样裁剪
MapJoinProcessor 如果用户指定mapjoin, 则将
ReduceSinkOperator转换成
MapSinkOperator
BucketMapJoin
Optimizer
采用分桶的Map Join, 扩大Map Join的
适用范围
SortedMergeBucket
MapJoinOptimizer
Sort Merge Join
UnionProcessor 目前只在两个子查询都是map-only
Task时做个标记
JoinReader /*+ STREAMTABLE(A) */
ReduceSink
DeDuplication
如果两个ReduceSinkOperator共享同一
个分区/排序列,则需要对他们进行合
并
2121
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
22. Logical Optimizer (Predicate Push Down)
2222
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)
WHERE p.maker = 'honda';
INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);
23. INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)
WHERE p.maker = 'honda';
INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);
Logical Optimizer (Predicate Push Down)
TableScanOperator
TS_1
TableScanOperator
TS_0
ReduceSinkOperator
RS_2
ReduceSinkOperator
RS_3
JoinOperator
JOIN_4
SelectOperator
SEL_6
FileSinkOperator
FS_7
2323
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
24. INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)
WHERE p.maker = 'honda';
INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);
Logical Optimizer (Predicate Push Down)
TableScanOperator
TS_1
TableScanOperator
TS_0
ReduceSinkOperator
RS_2
ReduceSinkOperator
RS_3
JoinOperator
JOIN_4
FilterOperator
FIL_5
(_col8 = 'honda')
SelectOperator
SEL_6
FileSinkOperator
FS_7
2424
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
25. INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)
WHERE p.maker = 'honda';
INSERT OVERWRITE TABLE access_log_temp2
SELECT a.user, a.prono, p.maker, p.price
FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);
Logical Optimizer (Predicate Push Down)
TableScanOperator
TS_1
TableScanOperator
TS_0
ReduceSinkOperator
RS_2
ReduceSinkOperator
RS_3
JoinOperator
JOIN_4
FilterOperator
FIL_5
(_col8 = 'honda')
SelectOperator
SEL_6
FileSinkOperator
FS_7
FilterOperator
FIL_8
(maker = 'honda')
2525
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
26. 26
Physical Plan Generator
MoveTask (Stage-0)
Ope
Tree
LoadTableDesc
MapRedTask (Stage-1/root)
TableScanOperator (TS_1)
JoinOperator (JOIN_4)
ReduceSinkOperator (RS_3)
FileSinkOperator (FS_6) StatsTask (Stage-2)
2626
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
OP
Tree
Task
Tree
TableScanOperator (TS_0)
ReduceSinkOperator (RS_2)
SelectOperator(SEL_5)
27. MapRedTask (Stage-1/root)
TableScanOperator (TS_1)
JoinOperator (JOIN_4)
ReduceSinkOperator (RS_3)
TableScanOperator (TS_0)
ReduceSinkOperator (RS_2)
SelectOperator(SEL_5)
Physical Plan Generator (result)
27 LCF
MapRedTask (Stage-1/root)
Mapper
TableScanOperator
TS_1
TableScanOperator
TS_0
ReduceSinkOperator
RS_2
ReduceSinkOperator
RS_3
Reducer
JoinOperator
JOIN_4
SelectOperator
SEL_5
FileSinkOperator
FS_6
272727
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
OP
Tree
Task
Tree
28. 28
Physical Optimizer
java/org/apache/hadoop/hive/ql/optimizer/physical/以下
概要
MapJoinResolver 处理Map Join
SkewJoinResolver 处理倾斜Join
CommonJoinResolver 处理普通Join
Vectorizer 使Hive从单行单行处理数据改为批量处理方
式,大大提升了指令流水线和缓存的利用率
SortMergeJoinResolver 与bucet配合,类似于归并排序
SamplingOptimizer 并行order by优化器
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
Task
Tree
Task
Tree
29. 29
Physical Optimizer (MapJoinResolver)
29
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
Task
Tree
Task
Tree
MapRedTask (Stage-1)
Mapper
TableScanOperator
TS_1
TableScanOperator
TS_0
MapJoinOperator
MAPJOIN_7
SelectOperator
SEL_5
FileSinkOperator
FS_6
SelectOperator
SEL_8
30. 30
Physical Optimizer (MapJoinResolver)
MapRedTask (Stage-1)
Mapper
TableScanOperator
TS_1
MapJoinOperator
MAPJOIN_7
SelectOperator
SEL_5
FileSinkOperator
FS_6
SelectOperator
SEL_8
MapredLocalTask (Stage-7)
TableScanOperator
TS_0
HashTableSinkOperator
HASHTABLESINK_11
MapRedTask (Stage-1)
Mapper
TableScanOperator
TS_1
TableScanOperator
TS_0
MapJoinOperator
MAPJOIN_7
SelectOperator
SEL_5
FileSinkOperator
FS_6
SelectOperator
SEL_8
30
Semantic
Analyzer
Logical
Plan Gen.
Logical
Optimizer
Physical
Plan Gen.
Physical
Optimizer
Parser
Task
Tree
Task
Tree
32. Common Join - Shuffle Join
• Default choice
• Always works
• Worst case scenario
• Each process
• Reads from part of one of the tables
• Buckets and sorts on join key
• Sends one bucket to each reduce
• Works everytime.
33. Map Join
• One table is small (eg. dimension table)
• Fits in memory
• Each process
• Reads small table into memory hash table
• Streams through part of the big file
• Joining each record from hash table
• Very fast, but limited
35. Converting Common Join into Map Join
Task A
CommonJoinTask
Task C
Task A
Conditional Task
Task C
MapJoinLocalTask
CommonJoinTask
. . . . .
c
a
b
Previous Execution Flow
Optimized Execution Flow
MapJoinTask
MapJoinLocalTask
MapJoinTask
MapJoinLocalTask
MapJoinTask
36. Execution Time
Task A
Conditional Task
Task C
MapJoinLocalTask
CommonJoinTask
a
MapJoinTask
Table X is the big
table
Both tables are too
big for map join
SELECT * FROM
SRC1 x JOIN SRC2 y
ON x.key = y.key;
37. Backup Task
Task A
Conditional Task
Task C
MapJoin LocalTask
CommonJoinTask
MapJoinTask
Run as a Backup
Task
Memory Bound
38. Performance Bottleneck
• Distributed Cache is the potential performance bottleneck
• Large hashtable file will slow down the propagation of Distributed Cache
• Mappers are waiting for the hashtables file from Distributed Cache
• Compress and archive all the hashtable file into a tar file
39. Bucket Map Join
• Why:
• Total table/partition size is big, not good for mapjoin
• How:
• set hive.optimize.bucketmapjoin = true;
• 1. Work together with map join
2. All join tables are bucketized, and each small table’s bucket
number can be divided by big table’s bucket number.
3. Bucket columns == Join columns
40. Bucket Map Join
SELECT /*+MAPJOIN(a,c)*/ a.*, b.*, c.*
a join b on a.key = b.key
join c on a.key=c.key;
Table b Table a Table c
Mapper 1
Bucket b1
Bucket
a1
Bucket
a2
Bucket
c1
Mapper 2
Bucket b1
Mapper 3
Bucket b2
a1
c1
a1
c1
a2
c1 Normally in production, there will be
thousands of buckets!
Table a,b,c all bucketized by ‘key’
a has 2 buckets, b has 2, and c has 1
1. Spawn mapper based on the big table
2. Only matching buckets of all small tables
are replicated onto each mapper
41. Sort Merge Bucket (SMB) Join
• If both tables are:
• Sorted the same
• Bucketed the same
• And joining on the sort/bucket column
• Each process:
• Reads a bucket from each table
• Process the row with the lowest value
• Very efficient if applicable
42. Sort Merge Bucket (SMB) Join
• Why:
• No limit on file/partition/table size
• How:
• set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set
hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInp
utFormat;
• 1. Work together with bucket map join
2. Bucket columns == Join columns == sort columns
43. Sort Merge Bucket Map Join
Facebook
Table A Table B Table C
1, val_1
3, val_3
5, val_5
4, val_4
4, val_4
20, val_20
23, val_23
20, val_20
25, val_25
Small tables are read on demand
NOT hold entire small tables in memory
Can perform outer join
44. Skew
• Skew is typical in real datasets
• A user complained that his job was slow
• He had 100 reduces
• 98 of them finished fast
• 2 ran really slow
• The key was a boolean...
45. Skew Join
• Join bottlenecked on the reducer who gets the skewed key
• set hive.optimize.skewjoin = true;
set hive.skewjoin.key = skew_key_threshold
46. Skew Join Reducer 1
Reducer 2
a-K 3
b-K 3
a-K 3
b-K 3
a-K 2
b-K 2 a-K 2
b-K 2
a-K 1
b-K 1Table
A
Table
B
A join B
Write to
HDFS
HDFS
File
a-K1
HDFS
File
b-K1
Map
join
a-k1
map join
b-k1
Job 1 Job 2
Final results
47. Skew Group by
• group by造成的倾斜有两个参数可以解决
• 一个是Hive.Map.aggr,默认值已经为true,意思是会做Map端的
combiner。所以如果你的group by查询只是做count(*)的话,其实是看不
出倾斜效果的,但是如果你做的是count(distinct),那么还是会看出一点
倾斜效果。
• 另一个参数是Hive.groupby. skewindata。这个参数的意思是做Reduce操
作的时候,拿到的key并不是所有相同值给同一个Reduce,而是随机分发,
然后Reduce做聚合,做完之后再做一轮MR,拿前面聚合过的数据再算结
果。所以这个参数其实跟Hive.Map.aggr做的是类似的事情,只是拿到
Reduce端来做,而且要额外启动一轮Job,所以其实不怎么推荐用,效
果不明显。
48. Case study
• Which of the following is faster?
• Select count(distict(Col)) from Tbl
• Select count(*) from (select distict(col) from Tbl)
- The first case:
- Maps send each value to the
reduce
- Single reduce counts them all
- The second case:
- Maps split up the values to
many reduces
- Each reduce generates its list
- Final job counts the size of
each list
- Singleton reduces are almost
always BAD
49. • Appendix: What does Explain show?
6/13/16 HIVE - A warehouse solution over Map Reduce Framework 49
50. Appendix: What does Explain show?
hive> explain INSERT OVERWRITE TABLE access_log_temp2
> SELECT a.user, a.prono, p.maker, p.price
> FROMaccess_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM(TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a)
(TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (.
(TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME
access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user))
(TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p)
maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Stage-2 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
a
TableScan
alias: a
Reduce Output Operator
key expressions:
expr: prono
type: int
sort order: +
Map-reduce partition columns:
expr: prono
type: int
tag: 0
value expressions:
expr: user
type: string
expr: prono
type: int
p
TableScan
alias: p
Reduce Output Operator
key expressions:
expr: prono
type: int
sort order: +
Map-reduce partition columns:
expr: prono
type: int
tag: 1
value expressions:
expr: maker
type: string
expr: price
type: int
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col0} {VALUE._col2}
1 {VALUE._col1} {VALUE._col2}
handleSkewJoin: false
outputColumnNames: _col0, _col2, _col6, _col7
Select Operator
expressions:
expr: _col0
type: string
expr: _col2
type: int
expr: _col6
type: string
expr: _col7
type: int
outputColumnNames: _col0, _col1, _col2, _col3
File Output Operator
compressed: false
GlobalTableId: 1
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.access_log_temp2
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputForma t
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.access_log_temp2
Stage: Stage-2
Stats-Aggr Operator
Time taken: 0.1 seconds
hive>
51. Appendix: What does Explain show?
hive> explain INSERT OVERWRITE TABLE access_log_temp2
> SELECT a.user, a.prono, p.maker, p.price
> FROMaccess_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM(TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a)
(TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (.
(TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME
access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user))
(TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p)
maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Stage-2 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
a
TableScan
alias: a
Reduce Output Operator
key expressions:
expr: prono
type: int
sort order: +
Map-reduce partition columns:
expr: prono
type: int
tag: 0
value expressions:
expr: user
type: string
expr: prono
type: int
p
TableScan
alias: p
Reduce Output Operator
key expressions:
expr: prono
type: int
sort order: +
Map-reduce partition columns:
expr: prono
type: int
tag: 1
value expressions:
expr: maker
type: string
expr: price
type: int
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col0} {VALUE._col2}
1 {VALUE._col1} {VALUE._col2}
handleSkewJoin: false
outputColumnNames: _col0, _col2, _col6, _col7
Select Operator
expressions:
expr: _col0
type: string
expr: _col2
type: int
expr: _col6
type: string
expr: _col7
type: int
outputColumnNames: _col0, _col1, _col2, _col3
File Output Operator
compressed: false
GlobalTableId: 1
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.access_log_temp2
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputForma t
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.access_log_temp2
Stage: Stage-2
Stats-Aggr Operator
Time taken: 0.1 seconds
hive>
ABSTRACT SYNTAX TREE:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Stage-2 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
TableScan
Reduce Output Operator
Reduce Operator Tree:
Join Operator
Select Operator
File Output Operator
Stage: Stage-0
Move Operator
Stage: Stage-2
Stats-Aggr Operator
53. Explain
• Hive doesn’t tell you what is
wrong
• Expects you to know.
• Explain tool provides query
plan
• Filters on input
• Numbers of jobs
• Numbers of maps and reduces
• What the jobs are sorting by
• What directories are they
reading or writing
56. Data Layout – HDFS Characteristics
• Provides Distributed File System
• Very high aggregate bandwidth
• Extreme scalability(up to 100 PB)
• Self-healing storage
• Relatively simple to administer
• Limitations
• Can’t modify existing files
• Single writer for each file
• Heavy bias for large files(> 100 MB)
57. Choices for Layout
• Partitions
• Top level mechanism for pruning
• Primary unit for updating tables(& schema)
• Directory per value of specified column
• Bucketing
• Hashed into a file, good for sampling
• Controls write parallelism
• Sort order
• The order the data is written within file
58. Example Hive Layout
• Directory
• Warehouse/$database/$table
• Partitioning
• /part1=$partValue/part2=$partValue
• Bucketing
• /$bucket_$attempt(eg. 000000_0)
• Sort
• Each file is sorted within the file
59. Layout Guidelines
• Limit the number of partitions
• 1000 partitions is much faster than 10000
• Nested partitions are almost always wrong
• Gauge the number of buckets
• Calculate file size and keep big (200 ~ 500MB)
• Don’t forget number of files (Buckets * Parts)
• Layout related tables the same way
• Partition
• Bucket and sort order
60. Data Format
• Serde
• Input/Output (aka File) Format
• Primary Choices
• Text
• Sequence File
• RCFile
• ORC
61. Text Format
• Critical to pick a Serde
• Default - 001 between fields
• JSON – top level JSON record
• CSV
• Slow to read and wirte
• Can‘t split compressed files
• Leads to huge maps
• Need to read/decompress all fields
62. Sequence File
• Traditional MapReduce binary file format
• Stores keys and values as classes
• Not a goof fit for Hive, which has SQL types
• Hive always stores entire row as value
• Splittable but only by searching file
• Default block size is 1 MB
• Need to read and decompress all fields
63. RCFile
• Columns stored separately
• Read and decompress only needed ones
• Better Compression
• Columns stored as binary blobs
• Depends on metastore to supply types
• Larger blocks
• 4MB by default
• Still search file for split boundary
64. ORC(Optimized Row Columnar)
• Columns stored separately
• Knows types
• Uses type-specific encoders
• Stores statistics(min, max, sum, count)
• Has light-weight index
• Skip over blocks of rows that don‘t matter
• Larger blocks
• 256 MB by default
• Has an index for block boundaries
67. ORC使用
• CREATE TABLE ... STORED AS ORC
• ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT ORC
• SET hive.default.fileformat=Orc
• 所有关于ORCFile的参数都是在Hive QL语句的TBLPROPERTIES字
段里面出现,他们是:
68. ORC使用 – 例子
create table Addresses (
name string,
street string,
city string,
state string,
zip int
) stored as orc tblproperties ("orc.compress"="NONE");
69. Vectorized Query Execution
• The Hive query execution engine currently processes one row at a time.
A single row of data goes through all the operators before the next
row can be processed. This mode of processing is very inefficient in
terms of CPU usage.
• This involves long code paths and significant metadata interpretation in
the inner loop of execution. Vectorized query execution streamlines
operations by processing a block of 1024 rows at a time. Within the
block, each column is stored as a vector (an array of a primitive data
type). Simple operations like arithmetic and comparisons are done by
quickly iterating through the vectors in a tight loop, with no or very few
function calls or conditional branches inside the loop. These loops
compile in a streamlined way that uses relatively few instructions and
finishes each instruction in fewer clock cycles, on average, by effectively
using the processor pipeline and cache memory.
70. Vectorized Query Execution - USAGE
• ORC format
• set hive.vectorized.execution.enabled = true;
• Vectorized execution is off by default, so your queries only utilize
it if this variable is turned on. To disable vectorized execution and
go back to standard execution, do the following:
• set hive.vectorized.execution.enabled = false;
71. Vectorized Query Execution - USAGE
• The following expressions can be vectorized when used on supported types:
• arithmetic: +, -, *, /, %
• AND, OR, NOT
• comparisons <, >, <=, >=, =, !=, BETWEEN, IN ( list-of-constants ) as filters
• Boolean-valued expressions (non-filters) using AND, OR, NOT, <, >, <=, >=, =, !=
• IS [NOT] NULL
• all math functions (SIN, LOG, etc.)
• string functions SUBSTR, CONCAT, TRIM, LTRIM, RTRIM, LOWER, UPPER, LENGTH
• type casts
• Hive user-defined functions, including standard and generic UDFs
• date functions (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, UNIX_TIMESTAMP)
• the IF conditional expression
72. Vectorized Query Execution – USAGE UDF
support
• User-defined functions are supported using a backward
compatibility bridge, so although they do run vectorized, they
don't run as fast as optimized vector implementations of built-in
operators and functions. Vectorized filter operations are evaluated
left-to-right, so for best performance, put UDFs on the right in an
ANDed list of expressions in the WHERE clause. E.g., use
• column1 = 10 and myUDF(column2) = "x"
73. Compression
• Need to pick level of compression
• None
• LZO or Snappy – fast but sloppy
• Best for temporary tables
• ZLIB – slow and complete
• Best for long term storage
81. 压缩文件的处理
• 对于输出结果为压缩文件形式存储的情况,要解决小文件问题,如果在Map输入前合并,对输出的文
件存储格式并没有限制。但是如果使用输出合并,则必须配合SequenceFile来存储,否则无法进行合
并,以下是示例:
• set mapred.output.compression. type=BLOCK;
• set hive.exec.compress.output= true;
• set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;
• set hive.merge.smallfiles.avgsize=100000000;
• drop table if exists dw_stage.zj_small;
• create table dw_stage.zj_small
• STORED AS SEQUENCEFILE
• as select *
• from dw_db.dw_soj_imp_dtl
• where log_dt = '2014-04-14'
• and paid like '%baidu%' ;
82. 使用HAR归档文件
• Hadoop的归档文件格式也是解决小文件问题的方式之一。而且Hive提供了
原生支持:
•
• set hive.archive.enabled= true;
• set hive.archive.har.parentdir.settable= true;
• set har.partfile.size=1099511627776;
• ALTER TABLE srcpart ARCHIVE PARTITION(ds= '2008-04-08', hr= '12' );
• ALTER TABLE srcpart UNARCHIVE PARTITION(ds= '2008-04-08', hr= '12' );
•
• 如果使用的不是分区表,则可创建成外部表,并使用har://协议来指定路径。
88. SQL整体优化
• 1. Job间并行
• 设置Job间并行的参数是Hive.exec.parallel,将其设为true即可。默认的并行度为8,也就是最多允许sql
中8个Job并行。如果想要更高的并行度,可以通过Hive.exec.parallel. thread.number参数进行设置,但
要避免设置过大而占用过多资源。
• 2.减少Job数
• 例子: 查询某网站访问过页面a和页面b的用户数量
select count(*)
from
(select distinct user_id
from logs where page_name = ‘a’) a
join
(select distinct user_id
from logs where blog_owner = ‘b’) b
on a.user_id = b.user_id;
89. SQL整体优化
select count(*)
from logs group by user_id
having (count(case when page_name = ‘a’ then 1 end) > 0
and count(case when page_name = ‘b’ then 1 end) > 0)
90. Indexed Hive
• Hive Indexing
• Provides key-based data view
• Keys data duplicated
• Storage layout favors search & lookup performance
• Provided better data access for certain operations
• A cheaper alternative to full data scans!
91. How does the index look like?
• An index is a table with 3 columns
• Data in index looks like
92. Hive index in HQL
• SELECT (mapping, projection, association, given key, fetch value)
• WHERE (filters on keys)
• GROUP BY (grouping on keys)
• JOIN (join key as index key)
• Indexes have high potential for accelerating wide range of queries
93. Hive Index
• Index as Reference
• Index as Data
• Here takes the index as data as the demonstration
• Uses Query Rewrite technique to transform queries on base table to
index table
• Limited applicability currently, but technique itself has wide potential
• Also a very quick way to demonstrate importance of index for
performance
94. Indexes and Query Rewrites
• GROUP BY, aggregation
• Index as Data
• Group By Key = Index Key
• Query rewritten to use indexes, but still a valid query (nothing special in
it!)
101. Why index performs better?
• Reducing data increases I/O efficiency
• Exploiting storage layout optimization
• e.g. GROUP BY:
• Sort + agg
• Hash & agg
• Sort step already in index
• Parallelization
• Process the index data in the same manner as base table, distribute the
processing across nodes
• Scalable