• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Programming Hive Reading #3
 

Programming Hive Reading #3

on

  • 9,058 views

 

Statistics

Views

Total Views
9,058
Views on SlideShare
2,062
Embed Views
6,996

Actions

Likes
1
Downloads
2
Comments
0

19 Embeds 6,996

http://dayafterneet.blogspot.jp 6740
http://dayafterneet.blogspot.com 166
https://www.google.co.jp 26
http://exchangite25.tawaba.com 20
http://dayafterneet.blogspot.kr 18
http://www.google.co.jp 3
http://dayafterneet.blogspot.co.uk 3
http://dayafterneet.blogspot.de 3
http://dayafterneet.blogspot.tw 3
http://dayafterneet.blogspot.sg 2
http://dayafterneet.blogspot.com.au 2
http://translate.googleusercontent.com 2
http://webcache.googleusercontent.com 2
http://www.feedspot.com 1
http://dayafterneet.blogspot.ca 1
http://dayafterneet.blogspot.hk 1
http://dayafterneet.blogspot.fr 1
http://ezsch.ezweb.ne.jp 1
https://www.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Programming Hive Reading #3 Programming Hive Reading #3 Presentation Transcript

    • Programming Hive Reading #3 @just_do_neet
    • Chapter 10. Tuning •Using Explain / Explain Extended •Optimized Join •Local Mode •Parallel Execution •Strict Mode •Tuning the Number of Map/Reduce •JVM etc...Programming Hive Reading #3 3
    • Using EXPLAIN •EXPLAIN使わなくて済むのは小学生ま(ry •出力内容 •Abstract Syntax Tree(AST) •Dependencies •Stage PlansProgramming Hive Reading #3 4
    • Using EXPLAIN •org.apache.hadoop.hive.ql.exec.ExplainTask http://grepcode.com/file/repository.cloudera.com$content$repositories$releases@org.apache.hadoop.hive$hive- exec@0.7.1-cdh3u1@org$apache$hadoop$hive$ql$exec$ExplainTask.javaProgramming Hive Reading #3 5
    • Using EXPLAIN •AST(抽象構文木) •TOK_FROM:入力元(TOK_TABREF=table) •TOK_INSERT:出力先 •TOK_SELECT:selectの条件Programming Hive Reading #3 6
    • Using EXPLAIN •Dependencies •MapReduce Job / Sampling Stage / Merge Stage / Limit Stage / etc..Programming Hive Reading #3 7
    • Using EXPLAIN •Stage PlansProgramming Hive Reading #3 8
    • Using EXPLAIN •Stage Plans:Operators http://hive.apache.org/docs/r0.7.1/api/org/apache/hadoop/hive/ql/exec/Operator.html •“EXPLAIN EXTENDED”にするとより詳細な情報が 出力される。(tmpファイルの出力先等)Programming Hive Reading #3 9
    • Optimized Join •tableのデータ件数によって式を調整。 ex. stocks > dividends の場合 •最右辺に出現するテーブル:streamed(at reduce) それ以外:bufferedProgramming Hive Reading #3 10
    • Optimized Join •stream tableはhint句 ”STREAMTABLE(tbl_name)”で 明示的に指定できる。Programming Hive Reading #3 11
    • Optimized Join •検証 a : 1,000,000,000 records b : 100,000,000 records $ SELECT a.hoge, b.fuga FROM a JOIN b on (a.id = b.id) 121.384 s $ SELECT a.hoge, b.fuga FROM b JOIN a on (b.id = a.id) 122.339 s $ SELECT /*+ streamtable(a) */ a.hoge, b.fuga FROM b JOIN a on (b.id = a.id) 120.298 sProgramming Hive Reading #3 12
    • Map Side Join •再掲Programming Hive Reading #3 13
    • Map Side Join •再掲Programming Hive Reading #3 14
    • Local Mode •データサイズが小さい場合はLocal Modeの方が overheadが減らせて速いケースがある。 $ set mapred.job.tracker = local; $ set mapred.tmp.dir =/tmp/masashi/sada; $ SELECT * FROM hoge FROM id = ‘fuga’ .......... Job running in-process (local Hadoop) ..........Programming Hive Reading #3 15
    • Local Mode •データサイズが小さい場合はLocal Modeの方が overheadが減らせて速いケースがある。 •ex. 約30,000レコードのtable normal mode : 27s local mode : 10s •ex. 約100,000,000レコードのtable normal mode : 40s local mode : 532sProgramming Hive Reading #3 16
    • Local Mode •自動的にLocal Mode処理をさせるには “hive.exec.mode.local.auto=true” •Local Mode動作する条件は以下 • The total input size of the job is lower than: “hive.exec.mode.local.auto.inputbytes.max” (128MB by default) • The total number of map-tasks is less than: “hive.exec.mode.local.auto.tasks.max” (4 by default) • The total number of reduce tasks required is 1 or 0.Programming Hive Reading #3 17
    • Strict Mode •Tuning? •有効にすると構文チェックが厳格になる。 ”hive.mapred.mode=strict”Programming Hive Reading #3 18
    • Tuning M/R Number •hive.exec.reducers.bytes.per.reducer = <number> •hive.exec.reducers.max = <number> •mapred.reduce.tasks = <number>Programming Hive Reading #3 19
    • JVM Reuse •1つのJVM上で動作するMap/Reduce Task数を設定 可能。(at “mapred-site.xml”) •-1の場合は無制限。Programming Hive Reading #3 20
    • Dynamic Partition Tuning •Dynamic Partitionの使用制約を設定可能。Programming Hive Reading #3 21
    • Single MR Multi Group By •参考:https://issues.apache.org/jira/browse/HIVE-2056 From table T insert overwrite table test1 select col1, count(distinct colx) group by col1 insert overwrite table test2 select col1, col2, count(distinct colx) group by col1, col2; •上記の場合”hive.multigroupby.singlemr=true”のほ うが速いらしい。Programming Hive Reading #3 22
    • Virtual Columns •Tuning? •以下の情報はHiveQLを用いて取得可能、ならびに 条件指定可能 •INPUT__FILE__NAME •BLOCK__OFFSET__INSIDE__FILE •ROW__OFFSET__INSIDE__BLOCK (“hive.exec.rowoffset=true”)Programming Hive Reading #3 23
    • Virtual Columns •Example https://cwiki.apache.org/Hive/languagemanual-virtualcolumns.htmlProgramming Hive Reading #3 24
    • Conclusion •実際のパフォーマンスチューニングには、上述の内 容よりもデータ構造の改善の方が効果が大きいと思 います。 •Chapter 11. ならびに Chapter 15. 担当の方に超期 待しています!!!Programming Hive Reading #3 25
    • ご清聴ありがとう ございました