Hive进阶
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,567
On Slideshare
1,567
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
15
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hive 进阶
  • 2. 目录 HQL 函数 HQL 语句 HQL 优化 Hive 参数优化
  • 3. HQL 函数 类型转换函数 cast (“1” as bigint ) case when a.uid is null 条件函数 then 0 else a.uid end = b.uid regexp_extract 文本函数 regexp_replace 复杂数据函数 A[n] 、 M[key] 、 S.x
  • 4. HQL 语句 Hive uses the columns in SORT Sort by BY to sort the rows before feeding the rows to a reducer Order by Hive uses the columns in Distribute By Distribute By to distribute the rows among reducers Cluster By is a short-cut for Cluster By both Distribute By and Sort By.
  • 5. HQL 语句 From * 多表文件插入insert overwrite table insert overwrite directory ADD { FILE[S] | JAR[S] | 引入外部资源 ARCHIVE[S] } SELECT /*+ MAPJOIN(b) */ Map Join a.key, a.Value FROM a join b on a.key = b.key
  • 6. HQL 优化 尽量内存读写 map 的输出数据更均匀的分布 目标 到 reduce 中去 explain hive.groupby.skewindata Group by =true Count Distinct 将值为空的情况单独处理
  • 7. HQL 优化 join 用 join key 分布最均匀的表作为驱动表 小表 Join 大表:使用 map join 让小 的维度表先进内存。在 map 端完成 reduce 大表 Join 大表:把空值的 key 变成一个字符 串加上随机数,把倾斜的数据分到不同 的 reduce 上
  • 8. Hive 参数 必要的列 hive.optimize.cp = true hive.optimize.pruner=true 必要的分区 hive.optimize.bucke tmapjoin = true; Map Join hive.optimize.bucketmapjoin.s ortedmerge = true;
  • 9. Hive 参数 set mapred.reduce.tasks hive.exec.reducers.bytes.per.reducer Reduce 个数 (默认为 1000^3 ) hive.exec.reducers.max (默认为 999 ) hive.merge.mapfiles = true 是否和并 Map 输出文件 hive.merge.mapredfiles 合并小文件 = false 是否合并 Reduce 输出文件 hive.merge.size.per.task = fileSize 合并文件的大小
  • 10. 参考文献• http://wiki.apache.org/hadoop/Hive• http://www.tbdata.org/archives/622• http://www.tbdata.org/archives/595• http://www.tbdata.org/archives/2109