Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Inside Hive (for beginners)<br />1<br />Takeshi NAKANO / Recruit Co. Ltd.<br />
Why?<br />Hive is good tool for non-specialist!<br />The number of M/R controls the Hive processing time.<br />↓<br />How ...
Don’t you have..<br />This fb’s paper has a lot of information!<br />But this is a little old..<br />7/6/2011<br />HIVE - ...
Component Level Analysis<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />4<br />
Hive Architecture / Exec Flow<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />5<br />Client<...
Client<br />Hadoop<br />Driver<br />Compiler<br />Hive Workflow<br />Hive has the operators which are minimum processing u...
Hive Workflow<br />Operators<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />7<br />
Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />For M/R processing, Hiveuses ExecMaper...
Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />On 1(Local mode)Hive fork the process ...
Compiler : How to Process HiveQL<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />10<br />Cli...
“Plumbing” of HIVE compiler<br />7/6/2011<br />11<br />HIVE - A warehouse solution over Map Reduce Framework<br />
“Plumbing” of HIVE compiler<br />7/6/2011<br />12<br />HIVE - A warehouse solution over Map Reduce Framework<br />
Compiler Overview<br />13<br />Parser<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<...
Compiler Overview<br />14<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Pla...
Parser<br />Hive<br />QL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.pr...
Parser<br />SQL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br />...
17<br />Semantic Analyzer (1/2)<br />AST<br />QB<br />+ TOK_FROM<br />      + TOK_JOIN<br />          + TOK_TABREF<br />  ...
18<br />Semantic Analyzer (2/2)<br />AST<br />QB<br />      + TOK_DESTINATION<br />          + TOK_TAB<br />              ...
19<br />Semantic Analyzer (2/2)<br />AST<br />QB<br />      + TOK_SELECT<br />          + TOK_SELEXPR<br />              +...
20<br />Logical Plan Generator (1/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Tabl...
21<br />Logical Plan Generator (2/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br /> + TOK_JOIN<br />          + TO...
22<br />Logical Plan Generator (3/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br />Name To Select Node<br />+ TOK_...
23<br />Logical Plan Generator (4/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />Name To Destination Table Info<b...
Logical Plan Generator (result)<br />24<br />LCF <br />OP<br />Tree<br />TableScanOperator<br />TS_1<br />TableScanOperato...
Logical Optimizer<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />P...
Logical Optimizer (Predicate Push Down)<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker...
Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVER...
INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN ...
Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVER...
30<br />Physical Plan Generator<br />OP<br />Tree<br />Task<br />Tree<br />MoveTask(Stage-0)<br />Ope<br />Tree<br />LoadT...
OP<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1/root)<br />TableScanOperator(TS_0)<br />Physical Plan Generator ...
32<br />Physical Optimizer<br />Task<br />Tree<br />Task<br />Tree<br />java/org/apache/hadoop/hive/ql/optimizer/physical/...
33<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1)<br />Mapper...
34<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapredLocalTask(Stage-7)<br />Ma...
In the end<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />35<br />Client<br />Hadoop<br />M...
In the end<br />36<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<...
End<br />7/6/2011<br />37<br />
Appendix: What does Explain show?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />38<br />
Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br />    >  SELECT a.user, a....
Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br />    >  SELECT a.user, a....
Appendix: What does Explain show?<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br />  Stage-1 is a root stage<br />...
Upcoming SlideShare
Loading in …5
×

Internal Hive

16,519 views

Published on

Explain the structure of Apache Hive.

Published in: Technology
  • http://dbmanagement.info/Tutorials/Apache_Hive.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Excellent!
    I really want know how to trace hql and the maprereduce job it created.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Excellent slides for hive beginners!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Internal Hive

  1. 1. Inside Hive (for beginners)<br />1<br />Takeshi NAKANO / Recruit Co. Ltd.<br />
  2. 2. Why?<br />Hive is good tool for non-specialist!<br />The number of M/R controls the Hive processing time.<br />↓<br />How can we reduce the number?<br />What can we do for this on writing HiveQL?<br />↓<br />How does Hive convert HiveQLto M/R jobs?<br />On this, what optimizing processes are adopted?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />2<br />
  3. 3. Don’t you have..<br />This fb’s paper has a lot of information!<br />But this is a little old..<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />3<br />
  4. 4. Component Level Analysis<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />4<br />
  5. 5. Hive Architecture / Exec Flow<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />5<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  6. 6. Client<br />Hadoop<br />Driver<br />Compiler<br />Hive Workflow<br />Hive has the operators which are minimum processing units.<br />The process of each operator is done with HDFS operation or M/R jobs.<br />The compiler converts HiveQL to the sets of operators.<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />6<br />Metastore<br />
  7. 7. Hive Workflow<br />Operators<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />7<br />
  8. 8. Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />For M/R processing, Hiveuses ExecMaper and ExecReducer.<br />On processing, we have 2 modes.<br />Local processing mode<br />Distributed processing mode<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />8<br />
  9. 9. Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.<br />On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />9<br />
  10. 10. Compiler : How to Process HiveQL<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />10<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  11. 11. “Plumbing” of HIVE compiler<br />7/6/2011<br />11<br />HIVE - A warehouse solution over Map Reduce Framework<br />
  12. 12. “Plumbing” of HIVE compiler<br />7/6/2011<br />12<br />HIVE - A warehouse solution over Map Reduce Framework<br />
  13. 13. Compiler Overview<br />13<br />Parser<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />
  14. 14. Compiler Overview<br />14<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<br />Operator <br />Tree<br />Logical<br />Optimizer<br />Operator <br />Tree<br />Physical<br />Plan Gen.<br />Task Tree<br />Physical<br />Optimizer<br />Task Tree<br />
  15. 15. Parser<br />Hive<br />QL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />Hive<br />QL<br />TOK_QUERY<br /> + TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />AST<br /> + TOK_INSERT<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2"<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  16. 16. Parser<br />SQL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />SQL<br />TOK_QUERY<br /> + TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br /> + TOK_INSERT<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2"<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />AST<br />1<br />2<br />3<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  17. 17. 17<br />Semantic Analyzer (1/2)<br />AST<br />QB<br />+ TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />AST<br />1<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Table Info(“access_log_hbase”)<br />“p”=Table Info(“product_hbase”)<br />ParseInfo<br />Join Node<br />+ TOK_JOIN<br /> + TOK_TABREF<br /> …<br /> + TOK_TABREF<br /> …<br /> + “=”<br /> …<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />17<br />
  18. 18. 18<br />Semantic Analyzer (2/2)<br />AST<br />QB<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2”<br />AST<br />2<br />QB<br />ParseInfo<br />NameTo Destination Node<br />+ TOK_TAB<br /> + TOK_TABNAME<br /> +"access_log_temp2”<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />18<br />18<br />
  19. 19. 19<br />Semantic Analyzer (2/2)<br />AST<br />QB<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />AST<br />QB<br />ParseInfo<br />3<br />Name To Select Node<br />+ TOK_SELECT<br /> + TOK_SELEXPR<br /> … <br /> + TOK_SELEXPR<br /> …<br /> + TOK_SELEXPR<br /> …<br /> + TOK_SELEXPR<br /> …<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />19<br />19<br />
  20. 20. 20<br />Logical Plan Generator (1/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Table Info(“access_log_hbase”)<br />“p”=Table Info(“product_hbase”)<br />OP<br />Tree<br />TableScanOperator(“access_log_hbase”)<br />TableScanOperator(“product_hbase”)<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />20<br />20<br />
  21. 21. 21<br />Logical Plan Generator (2/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />ReduceSinkOperator(“access_log_hbase”)<br />ReduceSinkOperator(“product_hbase”)<br />OP<br />Tree<br />JoinOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  22. 22. 22<br />Logical Plan Generator (3/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br />Name To Select Node<br />+ TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />OP<br />Tree<br />SelectOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  23. 23. 23<br />Logical Plan Generator (4/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />Name To Destination Table Info<br />“insclause-0”=<br /> Table Info(“access_log_temp2”)<br />OP<br />Tree<br />FileSinkOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  24. 24. Logical Plan Generator (result)<br />24<br />LCF <br />OP<br />Tree<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />JoinOperator<br />JOIN_4<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  25. 25. Logical Optimizer<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />25<br />25<br />25<br />
  26. 26. Logical Optimizer (Predicate Push Down)<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />26<br />26<br />
  27. 27. Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator<br />RS_2<br />JoinOperator<br />JOIN_4<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />27<br />27<br />
  28. 28. INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator<br />RS_2<br />JoinOperator<br />JOIN_4<br />FilterOperator<br />FIL_5<br />(_col8 = 'honda')<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />28<br />28<br />
  29. 29. Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />FilterOperator<br />FIL_8<br />(maker = 'honda')<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />JoinOperator<br />JOIN_4<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />FilterOperator<br />FIL_5<br />(_col8 = 'honda')<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />29<br />29<br />
  30. 30. 30<br />Physical Plan Generator<br />OP<br />Tree<br />Task<br />Tree<br />MoveTask(Stage-0)<br />Ope<br />Tree<br />LoadTableDesc<br />TableScanOperator(TS_0)<br />TableScanOperator(TS_1)<br />ReduceSinkOperator(RS_2)<br />MapRedTask(Stage-1/root)<br />ReduceSinkOperator(RS_3)<br />JoinOperator(JOIN_4)<br />SelectOperator(SEL_5)<br />FileSinkOperator(FS_6) <br />StatsTask(Stage-2)<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />30<br />30<br />
  31. 31. OP<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1/root)<br />TableScanOperator(TS_0)<br />Physical Plan Generator (result)<br />31<br />LCF <br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />TableScanOperator(TS_1)<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator(RS_2)<br />MapRedTask(Stage-1/root)<br />ReduceSinkOperator(RS_3)<br />Reducer<br />JoinOperator<br />JOIN_4<br />JoinOperator(JOIN_4)<br />SelectOperator<br />SEL_5<br />SelectOperator(SEL_5)<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />31<br />31<br />31<br />
  32. 32. 32<br />Physical Optimizer<br />Task<br />Tree<br />Task<br />Tree<br />java/org/apache/hadoop/hive/ql/optimizer/physical/以下<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  33. 33. 33<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1)<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />MapJoinOperator<br />MAPJOIN_7<br />SelectOperator<br />SEL_8<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />33<br />
  34. 34. 34<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapredLocalTask(Stage-7)<br />MapRedTask (Stage-1)<br />TableScanOperator<br />TS_0<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />HashTableSinkOperator<br />HASHTABLESINK_11<br />MapJoinOperator<br />MAPJOIN_7<br />MapRedTask (Stage-1)<br />SelectOperator<br />SEL_8<br />Mapper<br />TableScanOperator<br />TS_1<br />SelectOperator<br />SEL_5<br />MapJoinOperator<br />MAPJOIN_7<br />FileSinkOperator<br />FS_6<br />SelectOperator<br />SEL_8<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />34<br />
  35. 35. In the end<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />35<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  36. 36. In the end<br />36<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<br />Operator <br />Tree<br />Logical<br />Optimizer<br />Operator <br />Tree<br />Physical<br />Plan Gen.<br />Task Tree<br />Physical<br />Optimizer<br />Task Tree<br />
  37. 37. End<br />7/6/2011<br />37<br />
  38. 38. Appendix: What does Explain show?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />38<br />
  39. 39. Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br /> > SELECT a.user, a.prono, p.maker, p.price<br /> > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />OK<br />ABSTRACT SYNTAX TREE:<br /> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Alias -> Map Operator Tree:<br /> a<br />TableScan<br /> alias: a<br /> Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 0<br /> value expressions:<br />expr: user<br /> type: string<br />expr: prono<br /> type: int<br /> p<br />TableScan<br /> alias: p<br /> Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 1<br /> value expressions:<br />expr: maker<br /> type: string<br />expr: price<br /> type: int<br />Reduce Operator Tree:<br /> Join Operator<br /> condition map:<br /> Inner Join 0 to 1<br /> condition expressions:<br /> 0 {VALUE._col0} {VALUE._col2}<br /> 1 {VALUE._col1} {VALUE._col2}<br />handleSkewJoin: false<br />outputColumnNames: _col0, _col2, _col6, _col7<br /> Select Operator<br /> expressions:<br />expr: _col0<br /> type: string<br />expr: _col2<br /> type: int<br />expr: _col6<br /> type: string<br />expr: _col7<br /> type: int<br />outputColumnNames: _col0, _col1, _col2, _col3<br /> File Output Operator<br /> compressed: false<br />GlobalTableId: 1<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-0<br /> Move Operator<br /> tables:<br /> replace: true<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Time taken: 0.1 seconds<br />hive><br />
  40. 40. Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br /> > SELECT a.user, a.prono, p.maker, p.price<br /> > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />OK<br />ABSTRACT SYNTAX TREE:<br /> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Alias -> Map Operator Tree:<br /> a<br />TableScan<br /> alias: a<br />Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 0<br /> value expressions:<br />expr: user<br /> type: string<br />expr: prono<br /> type: int<br /> p<br />TableScan<br /> alias: p<br />Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 1<br /> value expressions:<br />expr: maker<br /> type: string<br />expr: price<br /> type: int<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Map Operator Tree:<br />TableScan<br /> Reduce Output Operator<br />TableScan<br /> Reduce Output Operator<br /> Reduce Operator Tree:<br /> Join Operator<br /> Select Operator<br /> File Output Operator<br /> Stage: Stage-0<br /> Move Operator<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Reduce Operator Tree:<br /> Join Operator<br /> condition map:<br /> Inner Join 0 to 1<br /> condition expressions:<br /> 0 {VALUE._col0} {VALUE._col2}<br /> 1 {VALUE._col1} {VALUE._col2}<br />handleSkewJoin: false<br />outputColumnNames: _col0, _col2, _col6, _col7<br /> Select Operator<br /> expressions:<br />expr: _col0<br /> type: string<br />expr: _col2<br /> type: int<br />expr: _col6<br /> type: string<br />expr: _col7<br /> type: int<br />outputColumnNames: _col0, _col1, _col2, _col3<br />File Output Operator<br /> compressed: false<br />GlobalTableId: 1<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-0<br /> Move Operator<br /> tables:<br /> replace: true<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Time taken: 0.1 seconds<br />hive><br />
  41. 41. Appendix: What does Explain show?<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Map Operator Tree:<br />TableScan<br /> Reduce Output Operator<br />TableScan<br /> Reduce Output Operator<br /> Reduce Operator Tree:<br /> Join Operator<br /> Select Operator<br /> File Output Operator<br /> Stage: Stage-0<br /> Move Operator<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />MapRedTask (Stage-1/root)<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />Reducer<br />JoinOperator<br />JOIN_4<br />≒<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />MoveTask (Stage-0)<br />Stats Task (Stage-2)<br />

×