Replacing Telco DB/DW to                           Hadoop and Hive                                JunHo Cho               ...
•   Cloud Computing Platform - Xen                   •   Cloud Storage Platform - hadoop                   •   Massive Ema...
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Telco DataFriday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
OpenSourceFriday, July 1, 2011
OpenSource                       Storage & ComputingFriday, July 1, 2011
OpenSourceFriday, July 1, 2011
OpenSource          CollectionFriday, July 1, 2011
OpenSourceFriday, July 1, 2011
OpenSource             SearchFriday, July 1, 2011
OpenSourceFriday, July 1, 2011
OpenSource                                    AnalysisFriday, July 1, 2011
OpenSourceFriday, July 1, 2011
OpenSource                         CoordinationFriday, July 1, 2011
OpenSourceFriday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Hive InternalFriday, July 1, 2011
Hive Architecture                       UI         Driver                       DDL           HQL                         ...
Hive Architecture                       UI         Driver   select col1 from tab1 where ...                       DDL     ...
Hive Architecture                       UI         Driver                       DDL           HQL                         ...
Hive Architecture                       UI         Driver                       DDL           HQL                         ...
Hive Architecture                       UI         Driver                       DDL           HQL                         ...
Hive Architecture                                    a 123344                                    b 121211                 ...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Parser                       Parser                                                   Select col1,col2 From tab1 Where col...
Parser                       Parser                                                     Select col1,col2 From tab1 Where c...
Parser                       Parser                                                   Select col1,col2 From tab1 Where col...
Parser                       Parser                                                   Select col1,col2 From tab1 Where col...
Parser                       Parser                                                   Select col1,col2 From tab1 Where col...
Parser                       Parser                                                   Select col1,col2 From tab1 Where col...
Parser                       Parser                                                   Select col1,col2 From tab1 Where col...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Plan                         Plan                                Select col1,col2 From tab1 Where col3 > 5                ...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Optimizer                  Optimizer   Select col1,col2 From tab1 Where col3 > 5                              TableScanOpe...
Optimizer                  Optimizer   Select col1,col2 From tab1 Where col3 > 5                              tab1 {col1, ...
Optimizer                  Optimizer   Select col1,col2 From tab1 Where col3 > 5                              tab1 {col1, ...
Optimizer                  Optimizer   Select col1,col2 From tab1 Where col3 > 5                              tab1 {col1, ...
Optimizer                  Optimizer   Select col1,col2 From tab1 Where col3 > 5                              tab1 {col1, ...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Optimizer                  Optimizer     Select col1,col2 From tab1 Where col3 > 5                                tab1 {co...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Hive Internal                                                            Map Reduce               Web UI       Hive CLI   ...
Task                         Task   Select col1,col2 From tab1 Where col3 > 5                                             ...
Task                         Task   Select col1,col2 From tab1 Where col3 > 5                                             ...
Task                         Task     Select col1,col2 From tab1 Where col3 > 5                                           ...
Task                         Task     Select col1,col2 From tab1 Where col3 > 5                                           ...
Task                         Task     Select col1,col2 From tab1 Where col3 > 5                                        Tas...
Task                         Task     Select col1,col2 From tab1 Where col3 > 5                                        Tas...
Task                         Task     Select col1,col2 From tab1 Where col3 > 5                                        Tas...
Task                         Task     Select col1,col2 From tab1 Where col3 > 5                                        Tas...
Task                         Task   Select col1,col2 From tab1 Where col3 > 5                                      TaskFac...
Task                         Task   Select col1,col2 From tab1 Where col3 > 5                                      TaskFac...
Task                         Task   Select col1,col2 From tab1 Where col3 > 5                                      TaskFac...
Hive Internal                                                                Map Reduce               Web UI       Hive CL...
Hive Internal                                                                Map Reduce               Web UI       Hive CL...
Oracle Migration                            to HiveFriday, July 1, 2011
l	              l	              l	       	              l	        	 Friday, July 1, 2011
l	                     l	              l	                     l	    	              l	       	             l	        ...
l	                     l	              l	                     l	    	              l	       	             l	        ...
Understand Oracle SQL                       • more than 3000 ETL SQL                       • understand Data-Flow         ...
Oracle SQLFriday, July 1, 2011
Data Model ConvertFriday, July 1, 2011
Data Model Convert                       TableFriday, July 1, 2011
Data Model Convert                       Table           TableFriday, July 1, 2011
Data Model Convert                        Table           Table                       PartitionFriday, July 1, 2011
Data Model Convert                        Table           Table                       Partition       PartitionFriday, Jul...
Data Model Convert                        Table           Table                       Partition       Partition           ...
Data Model Convert                        Table           Table                       Partition       Partition           ...
DataType ConvertFriday, July 1, 2011
DataType Convert                 NUMBER(n)Friday, July 1, 2011
DataType Convert                 NUMBER(n)         TINYINT                                 INT/BIGINTFriday, July 1, 2011
DataType Convert                 NUMBER(n)         TINYINT                                 INT/BIGINT               NUMBER...
DataType Convert                 NUMBER(n)         TINYINT                                 INT/BIGINT               NUMBER...
DataType Convert                 NUMBER(n)         TINYINT                                 INT/BIGINT               NUMBER...
DataType Convert                 NUMBER(n)         TINYINT                                 INT/BIGINT               NUMBER...
DataType Convert                 NUMBER(n)            TINYINT                                    INT/BIGINT               ...
DataType Convert                 NUMBER(n)              TINYINT                                      INT/BIGINT           ...
HIVE DML                       • HIVE supports ANSI-SQL                       • Only Support Sub-Queries in FROM clause   ...
IN ClauseFriday, July 1, 2011
IN Clause             IN SubQueryFriday, July 1, 2011
IN Clause             IN SubQuery              SELECT * from Employee e WHERE e.DeptNo              IN(SELECT d.DeptNo FRO...
IN Clause             IN SubQuery              SELECT * from Employee e WHERE e.DeptNo              IN(SELECT d.DeptNo FRO...
NOT IN ClauseFriday, July 1, 2011
NOT IN Clause             NOT IN SubQueryFriday, July 1, 2011
NOT IN Clause             NOT IN SubQuery              SELECT * from Employee e WHERE e.DeptNo              NOT IN(SELECT ...
NOT IN Clause             NOT IN SubQuery              SELECT * from Employee e WHERE e.DeptNo              NOT IN(SELECT ...
JOIN OperatorFriday, July 1, 2011
JOIN Operator              JOINFriday, July 1, 2011
JOIN Operator              JOIN              SELECT *              FROM       Employee e1, Dept d1   WHERE   e1.ID = d1.Id...
JOIN Operator              JOIN              SELECT *              FROM       Employee e1, Dept d1   WHERE   e1.ID = d1.Id...
Oracle FunctionFriday, July 1, 2011
FunctionsFriday, July 1, 2011
Functions            Math Function                        round,ceil,mod,                       power,sqrt,sin/cosFriday, ...
Functions            Math Function                          Math Function                        round,ceil,mod,          ...
Functions            Math Function                          Math Function                        round,ceil,mod,          ...
Functions            Math Function                          Math Function                        round,ceil,mod,          ...
Functions            Math Function                          Math Function                        round,ceil,mod,          ...
Functions            Math Function                          Math Function                        round,ceil,mod,          ...
Functions            Math Function                          Math Function                        round,ceil,mod,          ...
Custom UDF Function                       •   Condition Function                           •   DECODE, GREATEST           ...
Oracle Analytic                          FunctionFriday, July 1, 2011
Analytic FunctionFriday, July 1, 2011
Analytic Function     RANKFriday, July 1, 2011
Analytic Function     RANK      SELECT name,dept,salary,RANK()   OVER (PARTITION BY   dept      ORDER BY         salary   ...
Analytic Function     RANK      SELECT name,dept,salary,RANK()     OVER (PARTITION BY     dept      ORDER BY         salar...
Analytic Function     RANK      SELECT name,dept,salary,RANK()     OVER (PARTITION BY     dept      ORDER BY         salar...
Analytic Aggregation FunctionFriday, July 1, 2011
Analytic Aggregation Function      MINFriday, July 1, 2011
Analytic Aggregation Function      MIN      SELECT dept,           MIN(salary) OVER (PARTITION BY   dept)      FROM       ...
Analytic Aggregation Function      MIN      SELECT dept,           MIN(salary) OVER (PARTITION BY       dept)      FROM   ...
Analytic Aggregation Function      MIN      SELECT dept,           MIN(salary) OVER (PARTITION BY       dept)      FROM   ...
Hive InternalFriday, July 1, 2011
Merge Join Tree Bug                       • select * from a join b on a.v1 = b.v1                         join c on a.v1 =...
Merge Join Tree Bug                       • select * from a join b on a.v1 = b.v1                         join c on a.v1 =...
Merge Join Tree Bug                       • select * from a join b on a.v1 = b.v1                         join c on a.v1 =...
Merge Join Tree Bug Fix                       • SemanticAnalyzer                          private void mergeJoinTree(QB qb...
Merge Join Tree Bug Fix                       • SemanticAnalyzer                          private void mergeJoinTree(QB qb...
New HQL SyntaxFriday, July 1, 2011
New HQL Syntax      INSERT INTOFriday, July 1, 2011
New HQL Syntax      INSERT INTO      INSERT INTO table VALUES(col1 ... coln)      SELECT ... FROM tmp ...Friday, July 1, 2...
New HQL Syntax      INSERT INTO      INSERT INTO table VALUES(col1 ... coln)      SELECT ... FROM tmp ...          • INSER...
TuningFriday, July 1, 2011
Tuning              • Hadoop TunningFriday, July 1, 2011
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.taskFriday, July 1, 2011
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.task                  •    mapred.chil...
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.task                  •    mapred.chil...
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.task                  •    mapred.chil...
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.task                  •    mapred.chil...
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.task                  •    mapred.chil...
Tuning              • Hadoop Tunning                  •    mapred.job.reuse.jvm.num.task                  •    mapred.chil...
Wrap-Up             Oracle 2 HiveFriday, July 1, 2011
Wrap-Up             Oracle 2 Hive                Look insight data flow & modelFriday, July 1, 2011
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Wrap-Up             Oracle 2 Hive                Look insight data flow & model                Modify Oracle SQL to Hive Qu...
Friday, July 1, 2011
Friday, July 1, 2011
Question ?Friday, July 1, 2011
Upcoming SlideShare
Loading in...5
×

Replacing Telco DB/DW to Hadoop and Hive

6,813

Published on

the way to migrate oracle DW to hive.

Published in: Technology, Education
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,813
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
505
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide

Replacing Telco DB/DW to Hadoop and Hive

  1. 1. Replacing Telco DB/DW to Hadoop and Hive JunHo Cho Data Analysis Platform TeamFriday, July 1, 2011
  2. 2. • Cloud Computing Platform - Xen • Cloud Storage Platform - hadoop • Massive Email Archiving Solution - hadoop, lucene • HIVE : social network analysis using email • Log Archiving Solution - hadoop • Data Analysis data mining, machine learning, data statistic • Data Platform - hadoop, lucene, hive • Cloud Architecture - KT CloudFriday, July 1, 2011
  3. 3. Telco DataFriday, July 1, 2011
  4. 4. Telco DataFriday, July 1, 2011
  5. 5. Telco DataFriday, July 1, 2011
  6. 6. Telco DataFriday, July 1, 2011
  7. 7. Telco DataFriday, July 1, 2011
  8. 8. Telco DataFriday, July 1, 2011
  9. 9. Telco DataFriday, July 1, 2011
  10. 10. Telco DataFriday, July 1, 2011
  11. 11. Friday, July 1, 2011
  12. 12. Friday, July 1, 2011
  13. 13. Friday, July 1, 2011
  14. 14. Friday, July 1, 2011
  15. 15. Friday, July 1, 2011
  16. 16. Friday, July 1, 2011
  17. 17. Friday, July 1, 2011
  18. 18. Friday, July 1, 2011
  19. 19. OpenSourceFriday, July 1, 2011
  20. 20. OpenSource Storage & ComputingFriday, July 1, 2011
  21. 21. OpenSourceFriday, July 1, 2011
  22. 22. OpenSource CollectionFriday, July 1, 2011
  23. 23. OpenSourceFriday, July 1, 2011
  24. 24. OpenSource SearchFriday, July 1, 2011
  25. 25. OpenSourceFriday, July 1, 2011
  26. 26. OpenSource AnalysisFriday, July 1, 2011
  27. 27. OpenSourceFriday, July 1, 2011
  28. 28. OpenSource CoordinationFriday, July 1, 2011
  29. 29. OpenSourceFriday, July 1, 2011
  30. 30. Friday, July 1, 2011
  31. 31. Friday, July 1, 2011
  32. 32. Friday, July 1, 2011
  33. 33. Hive InternalFriday, July 1, 2011
  34. 34. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop ResultFriday, July 1, 2011
  35. 35. Hive Architecture UI Driver select col1 from tab1 where ... DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop ResultFriday, July 1, 2011
  36. 36. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop ResultFriday, July 1, 2011
  37. 37. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop ResultFriday, July 1, 2011
  38. 38. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop ResultFriday, July 1, 2011
  39. 39. Hive Architecture a 123344 b 121211 c 342434 UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop ResultFriday, July 1, 2011
  40. 40. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  41. 41. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  42. 42. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5Friday, July 1, 2011
  43. 43. Parser Parser Select col1,col2 From tab1 Where col3 > 5 QB TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5Friday, July 1, 2011
  44. 44. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR QB tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5Friday, July 1, 2011
  45. 45. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 QB insclause-0Friday, July 1, 2011
  46. 46. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE col1 QB TOK_TABLE_OR_COL 5 insclause-0Friday, July 1, 2011
  47. 47. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 QB TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0Friday, July 1, 2011
  48. 48. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE QB TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0Friday, July 1, 2011
  49. 49. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  50. 50. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  51. 51. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QBFriday, July 1, 2011
  52. 52. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TOK_WHERE TOK_SELECT TOK_DESTINATIONFriday, July 1, 2011
  53. 53. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATIONFriday, July 1, 2011
  54. 54. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATIONFriday, July 1, 2011
  55. 55. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATIONFriday, July 1, 2011
  56. 56. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATIONFriday, July 1, 2011
  57. 57. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATIONFriday, July 1, 2011
  58. 58. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATIONFriday, July 1, 2011
  59. 59. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION FileSinkOperatorFriday, July 1, 2011
  60. 60. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  61. 61. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  62. 62. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 TableScanOperator FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  63. 63. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  64. 64. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  65. 65. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator ColumnPruner SelectOperator FileSinkOperatorFriday, July 1, 2011
  66. 66. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperatorFriday, July 1, 2011
  67. 67. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator ContextFriday, July 1, 2011
  68. 68. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner SelectOperator FIL FileSinkOperator Context TS SELFriday, July 1, 2011
  69. 69. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperatorFriday, July 1, 2011
  70. 70. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL col1, col2 FileSinkOperatorFriday, July 1, 2011
  71. 71. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperatorFriday, July 1, 2011
  72. 72. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL col1, col2, col3 FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperatorFriday, July 1, 2011
  73. 73. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperatorFriday, July 1, 2011
  74. 74. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} FIL TableScanOperator Context TS col1, col2, col3 SEL FilterOperator ColumnPruner FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  75. 75. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  76. 76. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  77. 77. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QBFriday, July 1, 2011
  78. 78. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB FetchTaskFriday, July 1, 2011
  79. 79. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  80. 80. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  81. 81. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  82. 82. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  83. 83. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  84. 84. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  85. 85. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  86. 86. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  87. 87. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperatorFriday, July 1, 2011
  88. 88. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  89. 89. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBaseFriday, July 1, 2011
  90. 90. Oracle Migration to HiveFriday, July 1, 2011
  91. 91. l l l l Friday, July 1, 2011
  92. 92. l l l l l l l l Friday, July 1, 2011
  93. 93. l l l l l l l l Friday, July 1, 2011
  94. 94. Understand Oracle SQL • more than 3000 ETL SQL • understand Data-Flow • Group similar SQL Pattern • Investigate used Oracle FunctionFriday, July 1, 2011
  95. 95. Oracle SQLFriday, July 1, 2011
  96. 96. Data Model ConvertFriday, July 1, 2011
  97. 97. Data Model Convert TableFriday, July 1, 2011
  98. 98. Data Model Convert Table TableFriday, July 1, 2011
  99. 99. Data Model Convert Table Table PartitionFriday, July 1, 2011
  100. 100. Data Model Convert Table Table Partition PartitionFriday, July 1, 2011
  101. 101. Data Model Convert Table Table Partition Partition SamplingFriday, July 1, 2011
  102. 102. Data Model Convert Table Table Partition Partition Sampling BucketFriday, July 1, 2011
  103. 103. DataType ConvertFriday, July 1, 2011
  104. 104. DataType Convert NUMBER(n)Friday, July 1, 2011
  105. 105. DataType Convert NUMBER(n) TINYINT INT/BIGINTFriday, July 1, 2011
  106. 106. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m)Friday, July 1, 2011
  107. 107. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLEFriday, July 1, 2011
  108. 108. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2Friday, July 1, 2011
  109. 109. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRINGFriday, July 1, 2011
  110. 110. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATEFriday, July 1, 2011
  111. 111. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE STRING “yyyy-MM-dd HH:mm:ss” formatFriday, July 1, 2011
  112. 112. HIVE DML • HIVE supports ANSI-SQL • Only Support Sub-Queries in FROM clause • Join query : equi-join/inner-join outer-join self-joinFriday, July 1, 2011
  113. 113. IN ClauseFriday, July 1, 2011
  114. 114. IN Clause IN SubQueryFriday, July 1, 2011
  115. 115. IN Clause IN SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d)Friday, July 1, 2011
  116. 116. IN Clause IN SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) SELECT * from Employee e LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)Friday, July 1, 2011
  117. 117. NOT IN ClauseFriday, July 1, 2011
  118. 118. NOT IN Clause NOT IN SubQueryFriday, July 1, 2011
  119. 119. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d)Friday, July 1, 2011
  120. 120. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d) SELECT e.* from Employee e LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo) WHERE d.DeptNo IS NULLFriday, July 1, 2011
  121. 121. JOIN OperatorFriday, July 1, 2011
  122. 122. JOIN Operator JOINFriday, July 1, 2011
  123. 123. JOIN Operator JOIN SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.IdFriday, July 1, 2011
  124. 124. JOIN Operator JOIN SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id SELECT * FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id )Friday, July 1, 2011
  125. 125. Oracle FunctionFriday, July 1, 2011
  126. 126. FunctionsFriday, July 1, 2011
  127. 127. Functions Math Function round,ceil,mod, power,sqrt,sin/cosFriday, July 1, 2011
  128. 128. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cosFriday, July 1, 2011
  129. 129. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function substr,trim,lpad/rpad ltrim/rtrim,replaceFriday, July 1, 2011
  130. 130. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replaceFriday, July 1, 2011
  131. 131. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function coalesce,nvl,nvl2Friday, July 1, 2011
  132. 132. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function NULL Function coalesce,nvl,nvl2 coalesceFriday, July 1, 2011
  133. 133. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function NULL Function coalesce,nvl,nvl2 coalesce No NVL,NVL2Friday, July 1, 2011
  134. 134. Custom UDF Function • Condition Function • DECODE, GREATEST • Null Comparison Function • NVL / NVL2 • Type Conversion • TO_NUMBER • TO_CHAR • TO_DATE • INSTR4 • DATE_FORMAT • LAST_DAYFriday, July 1, 2011
  135. 135. Oracle Analytic FunctionFriday, July 1, 2011
  136. 136. Analytic FunctionFriday, July 1, 2011
  137. 137. Analytic Function RANKFriday, July 1, 2011
  138. 138. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM empFriday, July 1, 2011
  139. 139. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) eFriday, July 1, 2011
  140. 140. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp RANK(arg1,arg2) - Custom UDF SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) eFriday, July 1, 2011
  141. 141. Analytic Aggregation FunctionFriday, July 1, 2011
  142. 142. Analytic Aggregation Function MINFriday, July 1, 2011
  143. 143. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM empFriday, July 1, 2011
  144. 144. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.deptFriday, July 1, 2011
  145. 145. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp Aggregation + JOIN SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.deptFriday, July 1, 2011
  146. 146. Hive InternalFriday, July 1, 2011
  147. 147. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1Friday, July 1, 2011
  148. 148. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 educ e #3 M a pR join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1Friday, July 1, 2011
  149. 149. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 educ e #3 M a pR join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 duc e #2 Ma pRe join b on a.v1 = b.v1Friday, July 1, 2011
  150. 150. Merge Join Tree Bug Fix • SemanticAnalyzer private void mergeJoinTree(QB qb) { QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc()); if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); } } else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); }Friday, July 1, 2011
  151. 151. Merge Join Tree Bug Fix • SemanticAnalyzer private void mergeJoinTree(QB qb) { QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc()); if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); } } else { } else { if parent = parent.getJoinSrc(); (merged) { root = parent.getJoinSrc(); root = qb.getQbJoinTree(); } } else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); } }Friday, July 1, 2011
  152. 152. New HQL SyntaxFriday, July 1, 2011
  153. 153. New HQL Syntax INSERT INTOFriday, July 1, 2011
  154. 154. New HQL Syntax INSERT INTO INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ...Friday, July 1, 2011
  155. 155. New HQL Syntax INSERT INTO INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ... • INSERT [OVERWRITE] destination • grammar • modify FileSinkPlan • New Feature - HIVE-306 • INSERT INTO destinationFriday, July 1, 2011
  156. 156. TuningFriday, July 1, 2011
  157. 157. Tuning • Hadoop TunningFriday, July 1, 2011
  158. 158. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.taskFriday, July 1, 2011
  159. 159. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.optsFriday, July 1, 2011
  160. 160. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.sizeFriday, July 1, 2011
  161. 161. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.sizeFriday, July 1, 2011
  162. 162. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive TunningFriday, July 1, 2011
  163. 163. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormatFriday, July 1, 2011
  164. 164. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormat • query tuning - reduce # of MapReduce using HQL PlanFriday, July 1, 2011
  165. 165. Wrap-Up Oracle 2 HiveFriday, July 1, 2011
  166. 166. Wrap-Up Oracle 2 Hive Look insight data flow & modelFriday, July 1, 2011
  167. 167. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query SyntaxFriday, July 1, 2011
  168. 168. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in functionFriday, July 1, 2011
  169. 169. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFFriday, July 1, 2011
  170. 170. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic functionFriday, July 1, 2011
  171. 171. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udfFriday, July 1, 2011
  172. 172. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation)Friday, July 1, 2011
  173. 173. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hiveFriday, July 1, 2011
  174. 174. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive Hadoop + Hive TunningFriday, July 1, 2011
  175. 175. Friday, July 1, 2011
  176. 176. Friday, July 1, 2011
  177. 177. Question ?Friday, July 1, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×