Your SlideShare is downloading. ×
User-Defined Table Generating Functions
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

User-Defined Table Generating Functions

7,592
views

Published on

Published in: Technology, Business

0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,592
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
190
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. User-Defined Table-Generating Functions (UDTF) By Paul Yang (pyang@facebook.com)
  • 2. Outline UDTF Description and Usage Execution Phase Compile Phase
  • 3. UDF vs UDAF vs UDTF User Defined Functions – One-to-one mapping – concat(“foo”, “bar”) User Defined Aggregate Functions – Many-to-one mapping – sum(num_ads) User Defined Table-generating Functions – One-to-many mapping – explode([1,2,3])
  • 4. UDTF Example (Transform) explode(Array<?> arg) – Converts an array into multiple rows, with one element per row Transform-like syntax – SELECT udtf(col0, col1, …) AS colAlias FROM srcTable
  • 5. UDTF Example (Transform) SELECT explode(group_ids) AS group_id FROM src Table src Output group_ids group_id [1] 1 [2, 3] 2 3
  • 6. UDTF (Lateral View) Transform syntax limited to single expression – SELECT pageid, explode(adid_list)… Use lateral view – Creates a virtual table using UDTF Lateral view syntax – …FROM baseTable LATERAL VIEW udtf(col0, col1…) tableAlias AS colAlias0, colAlias1…
  • 7. UDTF (Lateral View) Example Query SELECT src.*, myTable.* FROM src LATERAL VIEW explode(group_ids) myTable AS group_id src user_id group_ids 100 [1] 101 [2,3]
  • 8. UDTF (Lateral View) src explode(group_ids) myTable AS group_id user_id group_ids group_id 100 [1] 1 101 [2,3] 2 3 Join input rows to output rows Result user_id group_ids group_id 100 [1] 1 101 [2,3] 2 101 [2,3] 3
  • 9. Outline UDTF Description and Usage Execution Phase Compile Phase
  • 10. Execution Phase 2 New Operators – UDTFOperator – LateralViewJoinOperator Different Operator DAG’s – For SELECT udtf(…) – For … FROM udtf(…) … LATERAL VIEW
  • 11. Execution Phase - UDTFOperator [1,2] UDTFOperator GenericUDTF processOp() process() [1,2] 1 Collector 1 forwardUDTFOutput() collect() 1
  • 12. GenericUDTF Interface
  • 13. Execution Phase - Transform SELECT explode(group_ids) AS group_id FROM src TS Gets the rows of src Selects group_ids SELECT UDTF Calls explode More Operators
  • 14. Execution Phase - LateralViewJoinOperator foo bar baz LVJ foo, bar foo, baz
  • 15. Execution Phase (Lateral View) FROM src LATERAL VIEW explode(group_ids) myTable AS group_id TS Gets the rows of src Selects other Selects arg. SELECT SELECT needed cols for UDTF columns (group_ids) (user_id, UDTF group_ids) Calls explode() LVJ Joins input and ouput rows More Operators
  • 16. Outline UDTF Description and Usage Execution Phase Compile Phase
  • 17. Compile Phase - Transform SELECT explode(group_id_array) AS group_id FROM src
  • 18. Compile Time - Transform UDTF detected in genSelectPlan() SemanticAnalyzer::genSelectPlan() ... if (udtfExpr.getType() == HiveParser.TOK_FUNCTION) { String funcName = TypeCheckProcFactory.DefaultExprProcessor.getFunctionText( udtfExpr, true); FunctionInfo fi = FunctionRegistry.getFunctionInfo(funcName); if (fi != null) { genericUDTF = fi.getGenericUDTF(); } isUDTF = (genericUDTF != null); } ...
  • 19. Compile Time - Transform If a UDTF is present – Generate a select operator to get the columns needed for UDTF – Attach UDTFOperator to SelectOperator SemanticAnalyzer::genSelectPlan() ... if (isInTransform) { output = genScriptPlan(trfm, qb, output); } if (isUDTF) { output = genUDTFPlan(genericUDTF, udtfTableAlias, udtfColAliases, qb, output); }
  • 20. Compile Phase - Lateral View SELECT * FROM src LATERAL VIEW explode(group_ids) myTable AS group_id Partial AST for above query:
  • 21. Compile Phase – Lateral View In SemanticAnalyzer::doPhase1() – make a mapping from the source table alias to TOK_LATERAL_VIEW “src”
  • 22. Compile Phase– Lateral View In SemanticAnalyzer:: TS genPlan() – Iterate through Map<String, Operator> aliasToOpInfo – Attach SELECT/UDTF Operator DAG with genLateralViewPlans() Attached by genLateralViewPlans()