SlideShare a Scribd company logo
1 of 41
Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
“Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
17 Semantic Analyzer (1/2) AST QB + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN     + TOK_TABREF         …     + TOK_TABREF         …     + “=”         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
18 Semantic Analyzer (2/2) AST QB       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB     + TOK_TABNAME         +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
19 Semantic Analyzer (2/2) AST QB       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT     + TOK_SELEXPR         …      + TOK_SELEXPR         …     + TOK_SELEXPR         …     + TOK_SELEXPR         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo  + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT     + TOK_SELEXPR         + "."              + TOK_TABLE_OR_COL                  + "a"              + "user"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "a"              + "prono"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "maker"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”=     Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Plan Generator (result) 24 LCF  OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6)  StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF  Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
End 7/6/2011 37
Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3             File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)

More Related Content

What's hot

Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoDatabricks
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File IntroductionOwen O'Malley
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 

What's hot (20)

Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
NiFi 시작하기
NiFi 시작하기NiFi 시작하기
NiFi 시작하기
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 

Similar to Internal Hive

Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinChad Cooper
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Ilya Grigorik
 
Python 3000
Python 3000Python 3000
Python 3000Bob Chao
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Roel Hartman
 
Computer science project work
Computer science project workComputer science project work
Computer science project workrahulchamp2345
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing frameworkIndicThreads
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеYandex
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerConnor McDonald
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly LanguageMotaz Saad
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database ConnectivityRanjan Kumar
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparqlkwangsub kim
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?brynary
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkChristian Trabold
 

Similar to Internal Hive (20)

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Hive_p
Hive_pHive_p
Hive_p
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
 
Python 3000
Python 3000Python 3000
Python 3000
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
 
Code Management
Code ManagementCode Management
Code Management
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
CloudKit
CloudKitCloudKit
CloudKit
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
 
Html5
Html5Html5
Html5
 
Php
PhpPhp
Php
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
 

More from Recruit Technologies

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場Recruit Technologies
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びRecruit Technologies
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Recruit Technologies
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話Recruit Technologies
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所Recruit Technologies
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Recruit Technologies
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例Recruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後Recruit Technologies
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Recruit Technologies
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するRecruit Technologies
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントRecruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルRecruit Technologies
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~Recruit Technologies
 

More from Recruit Technologies (20)

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Internal Hive

  • 1. Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
  • 3. Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
  • 4. Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
  • 5. Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
  • 6. Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
  • 7. Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
  • 8. Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
  • 9. Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
  • 10. Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
  • 11. “Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
  • 13. Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
  • 14. Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 15. Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 16. Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 17. 17 Semantic Analyzer (1/2) AST QB + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
  • 18. 18 Semantic Analyzer (2/2) AST QB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB + TOK_TABNAME +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
  • 19. 19 Semantic Analyzer (2/2) AST QB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
  • 20. 20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
  • 21. 21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 22. 22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 23. 23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”= Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 24. Logical Plan Generator (result) 24 LCF OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 25. Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
  • 26. Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
  • 27. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
  • 29. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
  • 30. 30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6) StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
  • 31. OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
  • 32. 32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 33. 33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
  • 34. 34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
  • 35. In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
  • 36. In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 38. Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
  • 39. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 40. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 41. Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)