Internal Hive

12,074 views

Published on

Explain the structure of Apache Hive.

Published in: Technology
4 Comments
42 Likes
Statistics
Notes
  • Apache Hive Tutorial (Videos and Books) Just $14 http://www.dbmanagement.info/Tutorials/Apache_Hive.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • http://dbmanagement.info/Tutorials/Apache_Hive.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Excellent!
    I really want know how to trace hql and the maprereduce job it created.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Excellent slides for hive beginners!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
12,074
On SlideShare
0
From Embeds
0
Number of Embeds
772
Actions
Shares
0
Downloads
647
Comments
4
Likes
42
Embeds 0
No embeds

No notes for slide

Internal Hive

  1. 1. Inside Hive (for beginners)<br />1<br />Takeshi NAKANO / Recruit Co. Ltd.<br />
  2. 2. Why?<br />Hive is good tool for non-specialist!<br />The number of M/R controls the Hive processing time.<br />↓<br />How can we reduce the number?<br />What can we do for this on writing HiveQL?<br />↓<br />How does Hive convert HiveQLto M/R jobs?<br />On this, what optimizing processes are adopted?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />2<br />
  3. 3. Don’t you have..<br />This fb’s paper has a lot of information!<br />But this is a little old..<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />3<br />
  4. 4. Component Level Analysis<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />4<br />
  5. 5. Hive Architecture / Exec Flow<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />5<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  6. 6. Client<br />Hadoop<br />Driver<br />Compiler<br />Hive Workflow<br />Hive has the operators which are minimum processing units.<br />The process of each operator is done with HDFS operation or M/R jobs.<br />The compiler converts HiveQL to the sets of operators.<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />6<br />Metastore<br />
  7. 7. Hive Workflow<br />Operators<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />7<br />
  8. 8. Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />For M/R processing, Hiveuses ExecMaper and ExecReducer.<br />On processing, we have 2 modes.<br />Local processing mode<br />Distributed processing mode<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />8<br />
  9. 9. Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.<br />On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />9<br />
  10. 10. Compiler : How to Process HiveQL<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />10<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  11. 11. “Plumbing” of HIVE compiler<br />7/6/2011<br />11<br />HIVE - A warehouse solution over Map Reduce Framework<br />
  12. 12. “Plumbing” of HIVE compiler<br />7/6/2011<br />12<br />HIVE - A warehouse solution over Map Reduce Framework<br />
  13. 13. Compiler Overview<br />13<br />Parser<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />
  14. 14. Compiler Overview<br />14<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<br />Operator <br />Tree<br />Logical<br />Optimizer<br />Operator <br />Tree<br />Physical<br />Plan Gen.<br />Task Tree<br />Physical<br />Optimizer<br />Task Tree<br />
  15. 15. Parser<br />Hive<br />QL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />Hive<br />QL<br />TOK_QUERY<br /> + TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />AST<br /> + TOK_INSERT<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2"<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  16. 16. Parser<br />SQL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />SQL<br />TOK_QUERY<br /> + TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br /> + TOK_INSERT<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2"<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />AST<br />1<br />2<br />3<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  17. 17. 17<br />Semantic Analyzer (1/2)<br />AST<br />QB<br />+ TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />AST<br />1<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Table Info(“access_log_hbase”)<br />“p”=Table Info(“product_hbase”)<br />ParseInfo<br />Join Node<br />+ TOK_JOIN<br /> + TOK_TABREF<br /> …<br /> + TOK_TABREF<br /> …<br /> + “=”<br /> …<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />17<br />
  18. 18. 18<br />Semantic Analyzer (2/2)<br />AST<br />QB<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2”<br />AST<br />2<br />QB<br />ParseInfo<br />NameTo Destination Node<br />+ TOK_TAB<br /> + TOK_TABNAME<br /> +"access_log_temp2”<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />18<br />18<br />
  19. 19. 19<br />Semantic Analyzer (2/2)<br />AST<br />QB<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />AST<br />QB<br />ParseInfo<br />3<br />Name To Select Node<br />+ TOK_SELECT<br /> + TOK_SELEXPR<br /> … <br /> + TOK_SELEXPR<br /> …<br /> + TOK_SELEXPR<br /> …<br /> + TOK_SELEXPR<br /> …<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />19<br />19<br />
  20. 20. 20<br />Logical Plan Generator (1/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Table Info(“access_log_hbase”)<br />“p”=Table Info(“product_hbase”)<br />OP<br />Tree<br />TableScanOperator(“access_log_hbase”)<br />TableScanOperator(“product_hbase”)<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />20<br />20<br />
  21. 21. 21<br />Logical Plan Generator (2/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />ReduceSinkOperator(“access_log_hbase”)<br />ReduceSinkOperator(“product_hbase”)<br />OP<br />Tree<br />JoinOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  22. 22. 22<br />Logical Plan Generator (3/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br />Name To Select Node<br />+ TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />OP<br />Tree<br />SelectOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  23. 23. 23<br />Logical Plan Generator (4/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />Name To Destination Table Info<br />“insclause-0”=<br /> Table Info(“access_log_temp2”)<br />OP<br />Tree<br />FileSinkOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  24. 24. Logical Plan Generator (result)<br />24<br />LCF <br />OP<br />Tree<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />JoinOperator<br />JOIN_4<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  25. 25. Logical Optimizer<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />25<br />25<br />25<br />
  26. 26. Logical Optimizer (Predicate Push Down)<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />26<br />26<br />
  27. 27. Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator<br />RS_2<br />JoinOperator<br />JOIN_4<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />27<br />27<br />
  28. 28. INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator<br />RS_2<br />JoinOperator<br />JOIN_4<br />FilterOperator<br />FIL_5<br />(_col8 = 'honda')<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />28<br />28<br />
  29. 29. Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />FilterOperator<br />FIL_8<br />(maker = 'honda')<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />JoinOperator<br />JOIN_4<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />FilterOperator<br />FIL_5<br />(_col8 = 'honda')<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />29<br />29<br />
  30. 30. 30<br />Physical Plan Generator<br />OP<br />Tree<br />Task<br />Tree<br />MoveTask(Stage-0)<br />Ope<br />Tree<br />LoadTableDesc<br />TableScanOperator(TS_0)<br />TableScanOperator(TS_1)<br />ReduceSinkOperator(RS_2)<br />MapRedTask(Stage-1/root)<br />ReduceSinkOperator(RS_3)<br />JoinOperator(JOIN_4)<br />SelectOperator(SEL_5)<br />FileSinkOperator(FS_6) <br />StatsTask(Stage-2)<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />30<br />30<br />
  31. 31. OP<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1/root)<br />TableScanOperator(TS_0)<br />Physical Plan Generator (result)<br />31<br />LCF <br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />TableScanOperator(TS_1)<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator(RS_2)<br />MapRedTask(Stage-1/root)<br />ReduceSinkOperator(RS_3)<br />Reducer<br />JoinOperator<br />JOIN_4<br />JoinOperator(JOIN_4)<br />SelectOperator<br />SEL_5<br />SelectOperator(SEL_5)<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />31<br />31<br />31<br />
  32. 32. 32<br />Physical Optimizer<br />Task<br />Tree<br />Task<br />Tree<br />java/org/apache/hadoop/hive/ql/optimizer/physical/以下<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  33. 33. 33<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1)<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />MapJoinOperator<br />MAPJOIN_7<br />SelectOperator<br />SEL_8<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />33<br />
  34. 34. 34<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapredLocalTask(Stage-7)<br />MapRedTask (Stage-1)<br />TableScanOperator<br />TS_0<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />HashTableSinkOperator<br />HASHTABLESINK_11<br />MapJoinOperator<br />MAPJOIN_7<br />MapRedTask (Stage-1)<br />SelectOperator<br />SEL_8<br />Mapper<br />TableScanOperator<br />TS_1<br />SelectOperator<br />SEL_5<br />MapJoinOperator<br />MAPJOIN_7<br />FileSinkOperator<br />FS_6<br />SelectOperator<br />SEL_8<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />34<br />
  35. 35. In the end<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />35<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  36. 36. In the end<br />36<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<br />Operator <br />Tree<br />Logical<br />Optimizer<br />Operator <br />Tree<br />Physical<br />Plan Gen.<br />Task Tree<br />Physical<br />Optimizer<br />Task Tree<br />
  37. 37. End<br />7/6/2011<br />37<br />
  38. 38. Appendix: What does Explain show?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />38<br />
  39. 39. Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br /> > SELECT a.user, a.prono, p.maker, p.price<br /> > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />OK<br />ABSTRACT SYNTAX TREE:<br /> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Alias -> Map Operator Tree:<br /> a<br />TableScan<br /> alias: a<br /> Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 0<br /> value expressions:<br />expr: user<br /> type: string<br />expr: prono<br /> type: int<br /> p<br />TableScan<br /> alias: p<br /> Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 1<br /> value expressions:<br />expr: maker<br /> type: string<br />expr: price<br /> type: int<br />Reduce Operator Tree:<br /> Join Operator<br /> condition map:<br /> Inner Join 0 to 1<br /> condition expressions:<br /> 0 {VALUE._col0} {VALUE._col2}<br /> 1 {VALUE._col1} {VALUE._col2}<br />handleSkewJoin: false<br />outputColumnNames: _col0, _col2, _col6, _col7<br /> Select Operator<br /> expressions:<br />expr: _col0<br /> type: string<br />expr: _col2<br /> type: int<br />expr: _col6<br /> type: string<br />expr: _col7<br /> type: int<br />outputColumnNames: _col0, _col1, _col2, _col3<br /> File Output Operator<br /> compressed: false<br />GlobalTableId: 1<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-0<br /> Move Operator<br /> tables:<br /> replace: true<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Time taken: 0.1 seconds<br />hive><br />
  40. 40. Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br /> > SELECT a.user, a.prono, p.maker, p.price<br /> > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />OK<br />ABSTRACT SYNTAX TREE:<br /> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Alias -> Map Operator Tree:<br /> a<br />TableScan<br /> alias: a<br />Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 0<br /> value expressions:<br />expr: user<br /> type: string<br />expr: prono<br /> type: int<br /> p<br />TableScan<br /> alias: p<br />Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 1<br /> value expressions:<br />expr: maker<br /> type: string<br />expr: price<br /> type: int<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Map Operator Tree:<br />TableScan<br /> Reduce Output Operator<br />TableScan<br /> Reduce Output Operator<br /> Reduce Operator Tree:<br /> Join Operator<br /> Select Operator<br /> File Output Operator<br /> Stage: Stage-0<br /> Move Operator<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Reduce Operator Tree:<br /> Join Operator<br /> condition map:<br /> Inner Join 0 to 1<br /> condition expressions:<br /> 0 {VALUE._col0} {VALUE._col2}<br /> 1 {VALUE._col1} {VALUE._col2}<br />handleSkewJoin: false<br />outputColumnNames: _col0, _col2, _col6, _col7<br /> Select Operator<br /> expressions:<br />expr: _col0<br /> type: string<br />expr: _col2<br /> type: int<br />expr: _col6<br /> type: string<br />expr: _col7<br /> type: int<br />outputColumnNames: _col0, _col1, _col2, _col3<br />File Output Operator<br /> compressed: false<br />GlobalTableId: 1<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-0<br /> Move Operator<br /> tables:<br /> replace: true<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Time taken: 0.1 seconds<br />hive><br />
  41. 41. Appendix: What does Explain show?<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Map Operator Tree:<br />TableScan<br /> Reduce Output Operator<br />TableScan<br /> Reduce Output Operator<br /> Reduce Operator Tree:<br /> Join Operator<br /> Select Operator<br /> File Output Operator<br /> Stage: Stage-0<br /> Move Operator<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />MapRedTask (Stage-1/root)<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />Reducer<br />JoinOperator<br />JOIN_4<br />≒<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />MoveTask (Stage-0)<br />Stats Task (Stage-2)<br />

×