SlideShare a Scribd company logo
Inside Hive (for beginners)1Takeshi NAKANO / Recruit Co. Ltd.
Why?Hive is good tool for non-specialist!The number of M/R controls the Hive processing time.↓How can we reduce the number?What can we do for this on writing HiveQL?↓How does Hive convert HiveQLto M/R jobs?On this, what optimizing processes are adopted?7/6/2011HIVE - A warehouse solution over Map Reduce Framework2
Don’t you have..This fb’s paper has a lot of information!But this is a little old..7/6/2011HIVE - A warehouse solution over Map Reduce Framework3
Component Level Analysis7/6/2011HIVE - A warehouse solution over Map Reduce Framework4
Hive Architecture / Exec Flow7/6/2011HIVE - A warehouse solution over Map Reduce Framework5ClientHadoopMetastoreDriverCompiler
ClientHadoopDriverCompilerHive WorkflowHive has the operators which are minimum processing units.The process of each operator is done with HDFS operation or M/R jobs.The compiler converts HiveQL to the sets of operators.7/6/2011HIVE - A warehouse solution over Map Reduce Framework6Metastore
Hive WorkflowOperators7/6/2011HIVE - A warehouse solution over Map Reduce Framework7
ClientHadoopMetastoreDriverCompilerHive WorkflowFor M/R processing, Hiveuses ExecMaper and ExecReducer.On processing, we have 2 modes.Local processing modeDistributed processing mode7/6/2011HIVE - A warehouse solution over Map Reduce Framework8
ClientHadoopMetastoreDriverCompilerHive WorkflowOn 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.7/6/2011HIVE - A warehouse solution over Map Reduce Framework9
Compiler : How to Process HiveQL7/6/2011HIVE - A warehouse solution over Map Reduce Framework10ClientHadoopMetastoreDriverCompiler
“Plumbing” of HIVE compiler7/6/201111HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler7/6/201112HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview13ParserSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizer
Compiler Overview14HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
ParserHiveQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);HiveQLTOK_QUERY  + TOK_FROM      + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“AST  + TOK_INSERT      + TOK_DESTINATION          + TOK_TAB              + TOK_TABNAME                  + "access_log_temp2"      + TOK_SELECT          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "user"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "prono"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "maker"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "price"SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
ParserSQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);SQLTOK_QUERY  + TOK_FROM      + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“  + TOK_INSERT      + TOK_DESTINATION          + TOK_TAB              + TOK_TABNAME                  + "access_log_temp2"      + TOK_SELECT          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "user"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "prono"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "maker"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "price"AST123SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
17Semantic Analyzer (1/2)ASTQB+ TOK_FROM      + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“AST1QBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)ParseInfoJoin Node+ TOK_JOIN    + TOK_TABREF        …    + TOK_TABREF        …    + “=”        …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser17
18Semantic Analyzer (2/2)ASTQB      + TOK_DESTINATION          + TOK_TAB              + TOK_TABNAME                  + "access_log_temp2”AST2QBParseInfoNameTo Destination Node+ TOK_TAB    + TOK_TABNAME        +"access_log_temp2”SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1818
19Semantic Analyzer (2/2)ASTQB      + TOK_SELECT          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "user"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "prono"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "maker"          + TOK_SELEXPR              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "price"ASTQBParseInfo3Name To Select Node+ TOK_SELECT    + TOK_SELEXPR        …     + TOK_SELEXPR        …    + TOK_SELEXPR        …    + TOK_SELEXPR        …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1919
20Logical Plan Generator (1/4)QBOPTreeQBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)OPTreeTableScanOperator(“access_log_hbase”)TableScanOperator(“product_hbase”)SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2020
21Logical Plan Generator (2/4)QBOPTreeQBParseInfo + TOK_JOIN          + TOK_TABREF              + TOK_TABNAME                  + "access_log_hbase"              + a          + TOK_TABREF              + TOK_TABNAME                  + "product_hbase"              + "p"          + "="              + "."                  + TOK_TABLE_OR_COL                      + "a"                  + "access_log_hbase"              + "."                  + TOK_TABLE_OR_COL                      + "p"                  + "prono“ReduceSinkOperator(“access_log_hbase”)ReduceSinkOperator(“product_hbase”)OPTreeJoinOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
22Logical Plan Generator (3/4)QBOPTreeQBParseInfoName To Select Node+ TOK_SELECT    + TOK_SELEXPR        + "."             + TOK_TABLE_OR_COL                 + "a"             + "user"    + TOK_SELEXPR         + "."             + TOK_TABLE_OR_COL                 + "a"             + "prono"    + TOK_SELEXPR         + "."             + TOK_TABLE_OR_COL                 + "p"             + "maker"    + TOK_SELEXPR         + "."             + TOK_TABLE_OR_COL                 + "p"             + "price"OPTreeSelectOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
23Logical Plan Generator (4/4)QBOPTreeQBMetaDataName To Destination Table Info“insclause-0”=    Table Info(“access_log_temp2”)OPTreeFileSinkOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
Logical Plan Generator (result)24LCF OPTreeTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
Logical OptimizerSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser252525
Logical Optimizer (Predicate Push Down)INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2626
Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2727
INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2828
Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);FilterOperatorFIL_8(maker = 'honda')ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2929
30Physical Plan GeneratorOPTreeTaskTreeMoveTask(Stage-0)OpeTreeLoadTableDescTableScanOperator(TS_0)TableScanOperator(TS_1)ReduceSinkOperator(RS_2)MapRedTask(Stage-1/root)ReduceSinkOperator(RS_3)JoinOperator(JOIN_4)SelectOperator(SEL_5)FileSinkOperator(FS_6) StatsTask(Stage-2)SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser3030
OPTreeTaskTreeMapRedTask (Stage-1/root)TableScanOperator(TS_0)Physical Plan Generator (result)31LCF MapperTableScanOperatorTS_1TableScanOperatorTS_0TableScanOperator(TS_1)ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReduceSinkOperator(RS_2)MapRedTask(Stage-1/root)ReduceSinkOperator(RS_3)ReducerJoinOperatorJOIN_4JoinOperator(JOIN_4)SelectOperatorSEL_5SelectOperator(SEL_5)FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser313131
32Physical OptimizerTaskTreeTaskTreejava/org/apache/hadoop/hive/ql/optimizer/physical/以下SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
33Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapRedTask (Stage-1)MapperTableScanOperatorTS_1TableScanOperatorTS_0MapJoinOperatorMAPJOIN_7SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser33
34Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapredLocalTask(Stage-7)MapRedTask (Stage-1)TableScanOperatorTS_0MapperTableScanOperatorTS_1TableScanOperatorTS_0HashTableSinkOperatorHASHTABLESINK_11MapJoinOperatorMAPJOIN_7MapRedTask (Stage-1)SelectOperatorSEL_8MapperTableScanOperatorTS_1SelectOperatorSEL_5MapJoinOperatorMAPJOIN_7FileSinkOperatorFS_6SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser34
In the end7/6/2011HIVE - A warehouse solution over Map Reduce Framework35ClientHadoopMetastoreDriverCompiler
In the end36HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
End7/6/201137
Appendix: What does Explain show?7/6/2011HIVE - A warehouse solution over Map Reduce Framework38
Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2    >  SELECT a.user, a.prono, p.maker, p.price    >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE:  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Alias -> Map Operator Tree:        aTableScan            alias: a            Reduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 0              value expressions:expr: user                    type: stringexpr: prono                    type: int        pTableScan            alias: p            Reduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 1              value expressions:expr: maker                    type: stringexpr: price                    type: intReduce Operator Tree:        Join Operator          condition map:               Inner Join 0 to 1          condition expressions:            0 {VALUE._col0} {VALUE._col2}            1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7          Select Operator            expressions:expr: _col0                  type: stringexpr: _col2                  type: intexpr: _col6                  type: stringexpr: _col7                  type: intoutputColumnNames: _col0, _col1, _col2, _col3            File Output Operator              compressed: falseGlobalTableId: 1              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: default.access_log_temp2  Stage: Stage-0    Move Operator      tables:          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: default.access_log_temp2  Stage: Stage-2    Stats-Aggr OperatorTime taken: 0.1 secondshive>
Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2    >  SELECT a.user, a.prono, p.maker, p.price    >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE:  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Alias -> Map Operator Tree:        aTableScan            alias: aReduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 0              value expressions:expr: user                    type: stringexpr: prono                    type: int        pTableScan            alias: pReduce Output Operator              key expressions:expr: prono                    type: int              sort order: +              Map-reduce partition columns:expr: prono                    type: int              tag: 1              value expressions:expr: maker                    type: stringexpr: price                    type: intABSTRACT SYNTAX TREE:STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Map Operator Tree:TableScan            Reduce Output OperatorTableScan            Reduce Output Operator      Reduce Operator Tree:        Join Operator          Select Operator            File Output Operator  Stage: Stage-0    Move Operator  Stage: Stage-2    Stats-Aggr OperatorReduce Operator Tree:        Join Operator          condition map:               Inner Join 0 to 1          condition expressions:            0 {VALUE._col0} {VALUE._col2}            1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7          Select Operator            expressions:expr: _col0                  type: stringexpr: _col2                  type: intexpr: _col6                  type: stringexpr: _col7                  type: intoutputColumnNames: _col0, _col1, _col2, _col3File Output Operator              compressed: falseGlobalTableId: 1              table:                  input format: org.apache.hadoop.mapred.TextInputFormat                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  name: default.access_log_temp2  Stage: Stage-0    Move Operator      tables:          replace: true          table:              input format: org.apache.hadoop.mapred.TextInputFormat              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe              name: default.access_log_temp2  Stage: Stage-2    Stats-Aggr OperatorTime taken: 0.1 secondshive>
Appendix: What does Explain show?ABSTRACT SYNTAX TREE:STAGE DEPENDENCIES:  Stage-1 is a root stage  Stage-0 depends on stages: Stage-1  Stage-2 depends on stages: Stage-0STAGE PLANS:  Stage: Stage-1    Map Reduce      Map Operator Tree:TableScan            Reduce Output OperatorTableScan            Reduce Output Operator      Reduce Operator Tree:        Join Operator          Select Operator            File Output Operator  Stage: Stage-0    Move Operator  Stage: Stage-2    Stats-Aggr OperatorMapRedTask (Stage-1/root)MapperTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReducerJoinOperatorJOIN_4≒SelectOperatorSEL_5FileSinkOperatorFS_6MoveTask (Stage-0)Stats Task (Stage-2)

More Related Content

What's hot

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
NTT DATA Technology & Innovation
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Databricks
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
6.hive
6.hive6.hive
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Edureka!
 
Hive tuning
Hive tuningHive tuning
Hive tuning
Michael Zhang
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
 

What's hot (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
6.hive
6.hive6.hive
6.hive
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 

Similar to Internal Hive

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
Selena Deckelmann
 
Hive_p
Hive_pHive_p
Hive_p
Samchu Li
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
Chad Cooper
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Ilya Grigorik
 
Python 3000
Python 3000Python 3000
Python 3000
Bob Chao
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Roel Hartman
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
rahulchamp2345
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
IndicThreads
 
Code Management
Code ManagementCode Management
Code Management
Y. Thong Kuah
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Yandex
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
Eric Turcotte
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Connor McDonald
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
Motaz Saad
 
CloudKit
CloudKitCloudKit
CloudKit
Jon Crosby
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
Ranjan Kumar
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
kwangsub kim
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
brynary
 
Html5
Html5Html5
Php
PhpPhp
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
Christian Trabold
 

Similar to Internal Hive (20)

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Hive_p
Hive_pHive_p
Hive_p
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
 
Python 3000
Python 3000Python 3000
Python 3000
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
 
Code Management
Code ManagementCode Management
Code Management
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
CloudKit
CloudKitCloudKit
CloudKit
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
 
Html5
Html5Html5
Html5
 
Php
PhpPhp
Php
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
 

More from Recruit Technologies

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
Recruit Technologies
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
Recruit Technologies
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Recruit Technologies
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
Recruit Technologies
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
Recruit Technologies
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
Recruit Technologies
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
Recruit Technologies
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Recruit Technologies
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
Recruit Technologies
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
Recruit Technologies
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
Recruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
Recruit Technologies
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
Recruit Technologies
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Recruit Technologies
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
Recruit Technologies
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
Recruit Technologies
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
Recruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
Recruit Technologies
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
Recruit Technologies
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
Recruit Technologies
 

More from Recruit Technologies (20)

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
 

Recently uploaded

High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
Priyanka Aash
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
Ivanti
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 

Recently uploaded (20)

High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 

Internal Hive

  • 1. Inside Hive (for beginners)1Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why?Hive is good tool for non-specialist!The number of M/R controls the Hive processing time.↓How can we reduce the number?What can we do for this on writing HiveQL?↓How does Hive convert HiveQLto M/R jobs?On this, what optimizing processes are adopted?7/6/2011HIVE - A warehouse solution over Map Reduce Framework2
  • 3. Don’t you have..This fb’s paper has a lot of information!But this is a little old..7/6/2011HIVE - A warehouse solution over Map Reduce Framework3
  • 4. Component Level Analysis7/6/2011HIVE - A warehouse solution over Map Reduce Framework4
  • 5. Hive Architecture / Exec Flow7/6/2011HIVE - A warehouse solution over Map Reduce Framework5ClientHadoopMetastoreDriverCompiler
  • 6. ClientHadoopDriverCompilerHive WorkflowHive has the operators which are minimum processing units.The process of each operator is done with HDFS operation or M/R jobs.The compiler converts HiveQL to the sets of operators.7/6/2011HIVE - A warehouse solution over Map Reduce Framework6Metastore
  • 7. Hive WorkflowOperators7/6/2011HIVE - A warehouse solution over Map Reduce Framework7
  • 8. ClientHadoopMetastoreDriverCompilerHive WorkflowFor M/R processing, Hiveuses ExecMaper and ExecReducer.On processing, we have 2 modes.Local processing modeDistributed processing mode7/6/2011HIVE - A warehouse solution over Map Reduce Framework8
  • 9. ClientHadoopMetastoreDriverCompilerHive WorkflowOn 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this.On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes.7/6/2011HIVE - A warehouse solution over Map Reduce Framework9
  • 10. Compiler : How to Process HiveQL7/6/2011HIVE - A warehouse solution over Map Reduce Framework10ClientHadoopMetastoreDriverCompiler
  • 11. “Plumbing” of HIVE compiler7/6/201111HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler7/6/201112HIVE - A warehouse solution over Map Reduce Framework
  • 14. Compiler Overview14HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
  • 15. ParserHiveQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);HiveQLTOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 16. ParserSQLASTINSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);SQLTOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"AST123SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 17. 17Semantic Analyzer (1/2)ASTQB+ TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“AST1QBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)ParseInfoJoin Node+ TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser17
  • 18. 18Semantic Analyzer (2/2)ASTQB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2”AST2QBParseInfoNameTo Destination Node+ TOK_TAB + TOK_TABNAME +"access_log_temp2”SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1818
  • 19. 19Semantic Analyzer (2/2)ASTQB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"ASTQBParseInfo3Name To Select Node+ TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR …SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser1919
  • 20. 20Logical Plan Generator (1/4)QBOPTreeQBMetaDataAliasTo Table Info“a”=Table Info(“access_log_hbase”)“p”=Table Info(“product_hbase”)OPTreeTableScanOperator(“access_log_hbase”)TableScanOperator(“product_hbase”)SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2020
  • 21. 21Logical Plan Generator (2/4)QBOPTreeQBParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ReduceSinkOperator(“access_log_hbase”)ReduceSinkOperator(“product_hbase”)OPTreeJoinOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 22. 22Logical Plan Generator (3/4)QBOPTreeQBParseInfoName To Select Node+ TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price"OPTreeSelectOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 23. 23Logical Plan Generator (4/4)QBOPTreeQBMetaDataName To Destination Table Info“insclause-0”= Table Info(“access_log_temp2”)OPTreeFileSinkOperatorSemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 24. Logical Plan Generator (result)24LCF OPTreeTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser
  • 26. Logical Optimizer (Predicate Push Down)INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2626
  • 27. Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2727
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_3ReduceSinkOperatorRS_2JoinOperatorJOIN_4FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2828
  • 29. Logical Optimizer (Predicate Push Down)TableScanOperatorTS_1TableScanOperatorTS_0INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);FilterOperatorFIL_8(maker = 'honda')ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3JoinOperatorJOIN_4INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda';FilterOperatorFIL_5(_col8 = 'honda')SelectOperatorSEL_6FileSinkOperatorFS_7SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser2929
  • 31. OPTreeTaskTreeMapRedTask (Stage-1/root)TableScanOperator(TS_0)Physical Plan Generator (result)31LCF MapperTableScanOperatorTS_1TableScanOperatorTS_0TableScanOperator(TS_1)ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReduceSinkOperator(RS_2)MapRedTask(Stage-1/root)ReduceSinkOperator(RS_3)ReducerJoinOperatorJOIN_4JoinOperator(JOIN_4)SelectOperatorSEL_5SelectOperator(SEL_5)FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser313131
  • 33. 33Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapRedTask (Stage-1)MapperTableScanOperatorTS_1TableScanOperatorTS_0MapJoinOperatorMAPJOIN_7SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser33
  • 34. 34Physical Optimizer (MapJoinResolver)TaskTreeTaskTreeMapredLocalTask(Stage-7)MapRedTask (Stage-1)TableScanOperatorTS_0MapperTableScanOperatorTS_1TableScanOperatorTS_0HashTableSinkOperatorHASHTABLESINK_11MapJoinOperatorMAPJOIN_7MapRedTask (Stage-1)SelectOperatorSEL_8MapperTableScanOperatorTS_1SelectOperatorSEL_5MapJoinOperatorMAPJOIN_7FileSinkOperatorFS_6SelectOperatorSEL_8SelectOperatorSEL_5FileSinkOperatorFS_6SemanticAnalyzerLogicalPlan Gen.LogicalOptimizerPhysicalPlan Gen.PhysicalOptimizerParser34
  • 35. In the end7/6/2011HIVE - A warehouse solution over Map Reduce Framework35ClientHadoopMetastoreDriverCompiler
  • 36. In the end36HiveQLParserASTSemanticAnalyzerQBLogicalPlan Gen.Operator TreeLogicalOptimizerOperator TreePhysicalPlan Gen.Task TreePhysicalOptimizerTask Tree
  • 38. Appendix: What does Explain show?7/6/2011HIVE - A warehouse solution over Map Reduce Framework38
  • 39. Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: aTableScan alias: a Reduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 0 value expressions:expr: user type: stringexpr: prono type: int pTableScan alias: p Reduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 1 value expressions:expr: maker type: stringexpr: price type: intReduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions:expr: _col0 type: stringexpr: _col2 type: intexpr: _col6 type: stringexpr: _col7 type: intoutputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: falseGlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr OperatorTime taken: 0.1 secondshive>
  • 40. Appendix: What does Explain show?hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono);OKABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price)))))STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: aTableScan alias: aReduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 0 value expressions:expr: user type: stringexpr: prono type: int pTableScan alias: pReduce Output Operator key expressions:expr: prono type: int sort order: + Map-reduce partition columns:expr: prono type: int tag: 1 value expressions:expr: maker type: stringexpr: price type: intABSTRACT SYNTAX TREE:STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree:TableScan Reduce Output OperatorTableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr OperatorReduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2}handleSkewJoin: falseoutputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions:expr: _col0 type: stringexpr: _col2 type: intexpr: _col6 type: stringexpr: _col7 type: intoutputColumnNames: _col0, _col1, _col2, _col3File Output Operator compressed: falseGlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr OperatorTime taken: 0.1 secondshive>
  • 41. Appendix: What does Explain show?ABSTRACT SYNTAX TREE:STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree:TableScan Reduce Output OperatorTableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr OperatorMapRedTask (Stage-1/root)MapperTableScanOperatorTS_1TableScanOperatorTS_0ReduceSinkOperatorRS_2ReduceSinkOperatorRS_3ReducerJoinOperatorJOIN_4≒SelectOperatorSEL_5FileSinkOperatorFS_6MoveTask (Stage-0)Stats Task (Stage-2)