Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Inside Hive (for beginners)<br />1<br />Takeshi NAKANO / Recruit Co. Ltd.<br />
Why?<br />Hive is good tool for non-specialist!<br />The number of M/R controls the Hive processing time.<br />↓<br />How ...
Don’t you have..<br />This fb’s paper has a lot of information!<br />But this is a little old..<br />7/6/2011<br />HIVE - ...
Component Level Analysis<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />4<br />
Hive Architecture / Exec Flow<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />5<br />Client<...
Client<br />Hadoop<br />Driver<br />Compiler<br />Hive Workflow<br />Hive has the operators which are minimum processing u...
Hive Workflow<br />Operators<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />7<br />
Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />For M/R processing, Hiveuses ExecMaper...
Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />On 1(Local mode)Hive fork the process ...
Compiler : How to Process HiveQL<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />10<br />Cli...
“Plumbing” of HIVE compiler<br />7/6/2011<br />11<br />HIVE - A warehouse solution over Map Reduce Framework<br />
“Plumbing” of HIVE compiler<br />7/6/2011<br />12<br />HIVE - A warehouse solution over Map Reduce Framework<br />
Compiler Overview<br />13<br />Parser<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<...
Compiler Overview<br />14<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Pla...
Parser<br />Hive<br />QL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.pr...
Parser<br />SQL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br />...
17<br />Semantic Analyzer (1/2)<br />AST<br />QB<br />+ TOK_FROM<br />      + TOK_JOIN<br />          + TOK_TABREF<br />  ...
18<br />Semantic Analyzer (2/2)<br />AST<br />QB<br />      + TOK_DESTINATION<br />          + TOK_TAB<br />              ...
19<br />Semantic Analyzer (2/2)<br />AST<br />QB<br />      + TOK_SELECT<br />          + TOK_SELEXPR<br />              +...
20<br />Logical Plan Generator (1/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Tabl...
21<br />Logical Plan Generator (2/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br /> + TOK_JOIN<br />          + TO...
22<br />Logical Plan Generator (3/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br />Name To Select Node<br />+ TOK_...
23<br />Logical Plan Generator (4/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />Name To Destination Table Info<b...
Logical Plan Generator (result)<br />24<br />LCF <br />OP<br />Tree<br />TableScanOperator<br />TS_1<br />TableScanOperato...
Logical Optimizer<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />P...
Logical Optimizer (Predicate Push Down)<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker...
Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVER...
INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN ...
Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVER...
30<br />Physical Plan Generator<br />OP<br />Tree<br />Task<br />Tree<br />MoveTask(Stage-0)<br />Ope<br />Tree<br />LoadT...
OP<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1/root)<br />TableScanOperator(TS_0)<br />Physical Plan Generator ...
32<br />Physical Optimizer<br />Task<br />Tree<br />Task<br />Tree<br />java/org/apache/hadoop/hive/ql/optimizer/physical/...
33<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1)<br />Mapper...
34<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapredLocalTask(Stage-7)<br />Ma...
In the end<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />35<br />Client<br />Hadoop<br />M...
In the end<br />36<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<...
End<br />7/6/2011<br />37<br />
Appendix: What does Explain show?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />38<br />
Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br />    >  SELECT a.user, a....
Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br />    >  SELECT a.user, a....
Appendix: What does Explain show?<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br />  Stage-1 is a root stage<br />...
Upcoming SlideShare
Loading in …5
×

Internal Hive

19,859 views

Published on

Explain the structure of Apache Hive.

Published in: Technology
  • 16,000 woodworking Projects. Decks, Sheds, Greenhouses, Chairs &amp; Tables, File Cabinets and Much More! #1 Recommended. Download Your Plans NOW! ✤✤✤ https://t.cn/A62YeZUX
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Essay writing was never my forte as English isn’t my first language but because I was good at math so they put me into Honors English. I really couldn’t be assed with reading King Lear and then writing a 5,000 word paper on it so I looked up essay services and HelpWriting.net was the first link to come up. I was kind of shocked with the quality of the paper they gave me. I received a very articulate and well-written piece of writing for like $20. Recommended it to a bunch of my foreign friends and now they use it too.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/39mQKz3 ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ♥♥♥ http://bit.ly/39mQKz3 ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Internal Hive

  1. 1. Inside Hive (for beginners)<br />1<br />Takeshi NAKANO / Recruit Co. Ltd.<br />
  2. 2. Why?<br />Hive is good tool for non-specialist!<br />The number of M/R controls the Hive processing time.<br />↓<br />How can we reduce the number?<br />What can we do for this on writing HiveQL?<br />↓<br />How does Hive convert HiveQLto M/R jobs?<br />On this, what optimizing processes are adopted?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />2<br />
  3. 3. Don’t you have..<br />This fb’s paper has a lot of information!<br />But this is a little old..<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />3<br />
  4. 4. Component Level Analysis<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />4<br />
  5. 5. Hive Architecture / Exec Flow<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />5<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  6. 6. Client<br />Hadoop<br />Driver<br />Compiler<br />Hive Workflow<br />Hive has the operators which are minimum processing units.<br />The process of each operator is done with HDFS operation or M/R jobs.<br />The compiler converts HiveQL to the sets of operators.<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />6<br />Metastore<br />
  7. 7. Hive Workflow<br />Operators<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />7<br />
  8. 8. Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />For M/R processing, Hiveuses ExecMaper and ExecReducer.<br />On processing, we have 2 modes.<br />Local processing mode<br />Distributed processing mode<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />8<br />
  9. 9. Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />Hive Workflow<br />On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.<br />On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />9<br />
  10. 10. Compiler : How to Process HiveQL<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />10<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  11. 11. “Plumbing” of HIVE compiler<br />7/6/2011<br />11<br />HIVE - A warehouse solution over Map Reduce Framework<br />
  12. 12. “Plumbing” of HIVE compiler<br />7/6/2011<br />12<br />HIVE - A warehouse solution over Map Reduce Framework<br />
  13. 13. Compiler Overview<br />13<br />Parser<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />
  14. 14. Compiler Overview<br />14<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<br />Operator <br />Tree<br />Logical<br />Optimizer<br />Operator <br />Tree<br />Physical<br />Plan Gen.<br />Task Tree<br />Physical<br />Optimizer<br />Task Tree<br />
  15. 15. Parser<br />Hive<br />QL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />Hive<br />QL<br />TOK_QUERY<br /> + TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />AST<br /> + TOK_INSERT<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2"<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  16. 16. Parser<br />SQL<br />AST<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />SQL<br />TOK_QUERY<br /> + TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br /> + TOK_INSERT<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2"<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />AST<br />1<br />2<br />3<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  17. 17. 17<br />Semantic Analyzer (1/2)<br />AST<br />QB<br />+ TOK_FROM<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />AST<br />1<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Table Info(“access_log_hbase”)<br />“p”=Table Info(“product_hbase”)<br />ParseInfo<br />Join Node<br />+ TOK_JOIN<br /> + TOK_TABREF<br /> …<br /> + TOK_TABREF<br /> …<br /> + “=”<br /> …<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />17<br />
  18. 18. 18<br />Semantic Analyzer (2/2)<br />AST<br />QB<br /> + TOK_DESTINATION<br /> + TOK_TAB<br /> + TOK_TABNAME<br /> + "access_log_temp2”<br />AST<br />2<br />QB<br />ParseInfo<br />NameTo Destination Node<br />+ TOK_TAB<br /> + TOK_TABNAME<br /> +"access_log_temp2”<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />18<br />18<br />
  19. 19. 19<br />Semantic Analyzer (2/2)<br />AST<br />QB<br /> + TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />AST<br />QB<br />ParseInfo<br />3<br />Name To Select Node<br />+ TOK_SELECT<br /> + TOK_SELEXPR<br /> … <br /> + TOK_SELEXPR<br /> …<br /> + TOK_SELEXPR<br /> …<br /> + TOK_SELEXPR<br /> …<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />19<br />19<br />
  20. 20. 20<br />Logical Plan Generator (1/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />AliasTo Table Info<br />“a”=Table Info(“access_log_hbase”)<br />“p”=Table Info(“product_hbase”)<br />OP<br />Tree<br />TableScanOperator(“access_log_hbase”)<br />TableScanOperator(“product_hbase”)<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />20<br />20<br />
  21. 21. 21<br />Logical Plan Generator (2/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br /> + TOK_JOIN<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "access_log_hbase"<br /> + a<br /> + TOK_TABREF<br /> + TOK_TABNAME<br /> + "product_hbase"<br /> + "p"<br /> + "="<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "access_log_hbase"<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "prono“<br />ReduceSinkOperator(“access_log_hbase”)<br />ReduceSinkOperator(“product_hbase”)<br />OP<br />Tree<br />JoinOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  22. 22. 22<br />Logical Plan Generator (3/4)<br />QB<br />OP<br />Tree<br />QB<br />ParseInfo<br />Name To Select Node<br />+ TOK_SELECT<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "user"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "a"<br /> + "prono"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "maker"<br /> + TOK_SELEXPR<br /> + "."<br /> + TOK_TABLE_OR_COL<br /> + "p"<br /> + "price"<br />OP<br />Tree<br />SelectOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  23. 23. 23<br />Logical Plan Generator (4/4)<br />QB<br />OP<br />Tree<br />QB<br />MetaData<br />Name To Destination Table Info<br />“insclause-0”=<br /> Table Info(“access_log_temp2”)<br />OP<br />Tree<br />FileSinkOperator<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  24. 24. Logical Plan Generator (result)<br />24<br />LCF <br />OP<br />Tree<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />JoinOperator<br />JOIN_4<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  25. 25. Logical Optimizer<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />25<br />25<br />25<br />
  26. 26. Logical Optimizer (Predicate Push Down)<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />26<br />26<br />
  27. 27. Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator<br />RS_2<br />JoinOperator<br />JOIN_4<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />27<br />27<br />
  28. 28. INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator<br />RS_2<br />JoinOperator<br />JOIN_4<br />FilterOperator<br />FIL_5<br />(_col8 = 'honda')<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />28<br />28<br />
  29. 29. Logical Optimizer (Predicate Push Down)<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />FilterOperator<br />FIL_8<br />(maker = 'honda')<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />JoinOperator<br />JOIN_4<br />INSERT OVERWRITE TABLE access_log_temp2<br /> SELECT a.user, a.prono, p.maker, p.price<br /> FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)<br /> WHERE p.maker = 'honda';<br />FilterOperator<br />FIL_5<br />(_col8 = 'honda')<br />SelectOperator<br />SEL_6<br />FileSinkOperator<br />FS_7<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />29<br />29<br />
  30. 30. 30<br />Physical Plan Generator<br />OP<br />Tree<br />Task<br />Tree<br />MoveTask(Stage-0)<br />Ope<br />Tree<br />LoadTableDesc<br />TableScanOperator(TS_0)<br />TableScanOperator(TS_1)<br />ReduceSinkOperator(RS_2)<br />MapRedTask(Stage-1/root)<br />ReduceSinkOperator(RS_3)<br />JoinOperator(JOIN_4)<br />SelectOperator(SEL_5)<br />FileSinkOperator(FS_6) <br />StatsTask(Stage-2)<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />30<br />30<br />
  31. 31. OP<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1/root)<br />TableScanOperator(TS_0)<br />Physical Plan Generator (result)<br />31<br />LCF <br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />TableScanOperator(TS_1)<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />ReduceSinkOperator(RS_2)<br />MapRedTask(Stage-1/root)<br />ReduceSinkOperator(RS_3)<br />Reducer<br />JoinOperator<br />JOIN_4<br />JoinOperator(JOIN_4)<br />SelectOperator<br />SEL_5<br />SelectOperator(SEL_5)<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />31<br />31<br />31<br />
  32. 32. 32<br />Physical Optimizer<br />Task<br />Tree<br />Task<br />Tree<br />java/org/apache/hadoop/hive/ql/optimizer/physical/以下<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />
  33. 33. 33<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapRedTask (Stage-1)<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />MapJoinOperator<br />MAPJOIN_7<br />SelectOperator<br />SEL_8<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />33<br />
  34. 34. 34<br />Physical Optimizer (MapJoinResolver)<br />Task<br />Tree<br />Task<br />Tree<br />MapredLocalTask(Stage-7)<br />MapRedTask (Stage-1)<br />TableScanOperator<br />TS_0<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />HashTableSinkOperator<br />HASHTABLESINK_11<br />MapJoinOperator<br />MAPJOIN_7<br />MapRedTask (Stage-1)<br />SelectOperator<br />SEL_8<br />Mapper<br />TableScanOperator<br />TS_1<br />SelectOperator<br />SEL_5<br />MapJoinOperator<br />MAPJOIN_7<br />FileSinkOperator<br />FS_6<br />SelectOperator<br />SEL_8<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />Semantic<br />Analyzer<br />Logical<br />Plan Gen.<br />Logical<br />Optimizer<br />Physical<br />Plan Gen.<br />Physical<br />Optimizer<br />Parser<br />34<br />
  35. 35. In the end<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />35<br />Client<br />Hadoop<br />Metastore<br />Driver<br />Compiler<br />
  36. 36. In the end<br />36<br />Hive<br />QL<br />Parser<br />AST<br />Semantic<br />Analyzer<br />QB<br />Logical<br />Plan Gen.<br />Operator <br />Tree<br />Logical<br />Optimizer<br />Operator <br />Tree<br />Physical<br />Plan Gen.<br />Task Tree<br />Physical<br />Optimizer<br />Task Tree<br />
  37. 37. End<br />7/6/2011<br />37<br />
  38. 38. Appendix: What does Explain show?<br />7/6/2011<br />HIVE - A warehouse solution over Map Reduce Framework<br />38<br />
  39. 39. Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br /> > SELECT a.user, a.prono, p.maker, p.price<br /> > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />OK<br />ABSTRACT SYNTAX TREE:<br /> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Alias -> Map Operator Tree:<br /> a<br />TableScan<br /> alias: a<br /> Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 0<br /> value expressions:<br />expr: user<br /> type: string<br />expr: prono<br /> type: int<br /> p<br />TableScan<br /> alias: p<br /> Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 1<br /> value expressions:<br />expr: maker<br /> type: string<br />expr: price<br /> type: int<br />Reduce Operator Tree:<br /> Join Operator<br /> condition map:<br /> Inner Join 0 to 1<br /> condition expressions:<br /> 0 {VALUE._col0} {VALUE._col2}<br /> 1 {VALUE._col1} {VALUE._col2}<br />handleSkewJoin: false<br />outputColumnNames: _col0, _col2, _col6, _col7<br /> Select Operator<br /> expressions:<br />expr: _col0<br /> type: string<br />expr: _col2<br /> type: int<br />expr: _col6<br /> type: string<br />expr: _col7<br /> type: int<br />outputColumnNames: _col0, _col1, _col2, _col3<br /> File Output Operator<br /> compressed: false<br />GlobalTableId: 1<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-0<br /> Move Operator<br /> tables:<br /> replace: true<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Time taken: 0.1 seconds<br />hive><br />
  40. 40. Appendix: What does Explain show?<br />hive> explain INSERT OVERWRITE TABLE access_log_temp2<br /> > SELECT a.user, a.prono, p.maker, p.price<br /> > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);<br />OK<br />ABSTRACT SYNTAX TREE:<br /> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Alias -> Map Operator Tree:<br /> a<br />TableScan<br /> alias: a<br />Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 0<br /> value expressions:<br />expr: user<br /> type: string<br />expr: prono<br /> type: int<br /> p<br />TableScan<br /> alias: p<br />Reduce Output Operator<br /> key expressions:<br />expr: prono<br /> type: int<br /> sort order: +<br /> Map-reduce partition columns:<br />expr: prono<br /> type: int<br /> tag: 1<br /> value expressions:<br />expr: maker<br /> type: string<br />expr: price<br /> type: int<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Map Operator Tree:<br />TableScan<br /> Reduce Output Operator<br />TableScan<br /> Reduce Output Operator<br /> Reduce Operator Tree:<br /> Join Operator<br /> Select Operator<br /> File Output Operator<br /> Stage: Stage-0<br /> Move Operator<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Reduce Operator Tree:<br /> Join Operator<br /> condition map:<br /> Inner Join 0 to 1<br /> condition expressions:<br /> 0 {VALUE._col0} {VALUE._col2}<br /> 1 {VALUE._col1} {VALUE._col2}<br />handleSkewJoin: false<br />outputColumnNames: _col0, _col2, _col6, _col7<br /> Select Operator<br /> expressions:<br />expr: _col0<br /> type: string<br />expr: _col2<br /> type: int<br />expr: _col6<br /> type: string<br />expr: _col7<br /> type: int<br />outputColumnNames: _col0, _col1, _col2, _col3<br />File Output Operator<br /> compressed: false<br />GlobalTableId: 1<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-0<br /> Move Operator<br /> tables:<br /> replace: true<br /> table:<br /> input format: org.apache.hadoop.mapred.TextInputFormat<br /> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<br />serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe<br /> name: default.access_log_temp2<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />Time taken: 0.1 seconds<br />hive><br />
  41. 41. Appendix: What does Explain show?<br />ABSTRACT SYNTAX TREE:<br />STAGE DEPENDENCIES:<br /> Stage-1 is a root stage<br /> Stage-0 depends on stages: Stage-1<br /> Stage-2 depends on stages: Stage-0<br />STAGE PLANS:<br /> Stage: Stage-1<br /> Map Reduce<br /> Map Operator Tree:<br />TableScan<br /> Reduce Output Operator<br />TableScan<br /> Reduce Output Operator<br /> Reduce Operator Tree:<br /> Join Operator<br /> Select Operator<br /> File Output Operator<br /> Stage: Stage-0<br /> Move Operator<br /> Stage: Stage-2<br /> Stats-Aggr Operator<br />MapRedTask (Stage-1/root)<br />Mapper<br />TableScanOperator<br />TS_1<br />TableScanOperator<br />TS_0<br />ReduceSinkOperator<br />RS_2<br />ReduceSinkOperator<br />RS_3<br />Reducer<br />JoinOperator<br />JOIN_4<br />≒<br />SelectOperator<br />SEL_5<br />FileSinkOperator<br />FS_6<br />MoveTask (Stage-0)<br />Stats Task (Stage-2)<br />

×