Drill / SQL / Optiq     Julian Hyde Apache Drill User Group     2013-03-13
SQL
SQL: Pros & consFact:  SQL is older than Macaulay CulkinLess interesting but more relevant:  Can be written by (lots of)...
Quick intro to Optiq
Introducing OptiqFrameworkDerived from LucidDBMinimal query mediator:    No storage    No runtime    No metadata    Qu...
SELECT p.“product_name”, COUNT(*) AS cExpression tree                        FROM “splunk”.”splunk” AS s                  ...
SELECT p.“product_name”, COUNT(*) AS cExpression tree                      FROM “splunk”.”splunk” AS s                    ...
Metadata SPI    interface Table    −   RelDataType getRowType()    interface TableFunction    −   List<Parameter> getPar...
Operators and rules    Rule: interface RelOptRule    Operator: interface RelNode    Core operators: TableAccess, Projec...
Planning algorithm    Start with a logical plan and a set of    rewrite rules    Form a list of rewrite rules that are  ...
Concepts    Cost    Equivalence sets    Calling convention    Logical vs Physical    Traits    Implementation
Outside the kernel                                      JDBC client    SQL    parser/validator                           ...
Optiq roadmap    Building blocks for analytic DB:    −   In-memory tables in a distributed cache    −   Materialized view...
Applying Optiq to Drill  1. Enhance SQL 2. Query translation
Drill vs Traditional SQL    SQL:    −   Flat data    −   Schema up front    Drill:    −   Nested data (list & map)    − ...
ARRAY & MAP SQL types  ARRAY is like java.util.List  MAP is like java.util.LinkedHashMapExamples:  VALUES ARRAY [a, b, ...
ANY SQL type    ANY means “type to be determined at runtime”    Validator narrows down possible type based    on operato...
Sugaring the donutQuery:  SELECT c[ppu], c[toppings][1] FROM  DonutsAdditional syntactic sugar:  c.x means c[x]So:  CRE...
UNNESTEmployees nested inside departments:  CREATE TYPE employee (empno INT, name  VARCHAR(30));  CREATE TABLE dept (dep...
Applying Optiq to Drill   1. Enhance SQL2. Query translation
Query translationSQL:  select d[name] as name, d[xx] as xx  from (    select _MAP[donuts] as d from donuts)  where cast(d...
Planner logOriginal rel:AbstractConverter(subset=[rel#14:Subset#3.AR RAY], convention=[ARRAY]) ProjectRel(subset=[rel#10:S...
Next    Translate join, aggregate, sort, set ops    Operator overloading with ANY    Mondrian on Drill
Thank you!https://github.com/julianhyde/share/tree/master/slideshttps://github.com/julianhyde/incubator-drillhttps://githu...
Drill / SQL / Optiq
Upcoming SlideShare
Loading in...5
×

Drill / SQL / Optiq

1,809

Published on

Julian Hyde updates the Drill User Group on progress adding a SQL interface to Apache Drill using the Optiq framework.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,809
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
47
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • It&apos;s much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn&apos;t have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  • It&apos;s much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn&apos;t have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  • Drill / SQL / Optiq

    1. 1. Drill / SQL / Optiq Julian Hyde Apache Drill User Group 2013-03-13
    2. 2. SQL
    3. 3. SQL: Pros & consFact: SQL is older than Macaulay CulkinLess interesting but more relevant: Can be written by (lots of) humans Can be written by machines Requires query optimization Allows query optimization Based on “flat” relations and basic relational operations
    4. 4. Quick intro to Optiq
    5. 5. Introducing OptiqFrameworkDerived from LucidDBMinimal query mediator: No storage No runtime No metadata Query planning engine Core operators & rewrite rules Optional SQL parser/validator
    6. 6. SELECT p.“product_name”, COUNT(*) AS cExpression tree FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = purchase GROUP BY p.”product_name” ORDER BY c DESC Splunk Table: splunk Key: product_name Key: product_id Agg: count Condition: Key: c DESC action = purchase scan joinMySQL filter group sort scan Table: products
    7. 7. SELECT p.“product_name”, COUNT(*) AS cExpression tree FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p(optimized) ON s.”product_id” = p.”product_id” WHERE s.“action” = purchase GROUP BY p.”product_name” ORDER BY c DESC Splunk Condition: Table: splunk action = purchase Key: product_name Agg: count Key: c DESC Key: product_id scan filterMySQL join group sort scan Table: products
    8. 8. Metadata SPI interface Table − RelDataType getRowType() interface TableFunction − List<Parameter> getParameters() − Table apply(List arguments) − e.g. ViewTableFunction interface Schema − Map<String, List<TableFunction>> getTableFunctions()
    9. 9. Operators and rules Rule: interface RelOptRule Operator: interface RelNode Core operators: TableAccess, Project, Filter, Join, Aggregate, Order, Union, Intersect, Minus, Values Some rules: MergeFilterRule, PushAggregateThroughUnionRule, RemoveCorrelationForScalarProjectRule + 100 more
    10. 10. Planning algorithm Start with a logical plan and a set of rewrite rules Form a list of rewrite rules that are applicable (based on pattern-matching) Fire the rule that is likely to do the most good Rule generates an expression that is equivalent (and hopefully cheaper) Queue up new rule matches Repeat until cheap enough
    11. 11. Concepts Cost Equivalence sets Calling convention Logical vs Physical Traits Implementation
    12. 12. Outside the kernel JDBC client SQL parser/validator JDBC server JDBC driver Optional SQL parser / Metadata validator SPI SQL function Core Query Pluggable library (validation 3 planner rd 3 rd rules + code- Pluggable party party generation) 3 party rd ops ops 3rd party data data Lingual (Cascading adapter) Splunk adapter Drill adapter
    13. 13. Optiq roadmap Building blocks for analytic DB: − In-memory tables in a distributed cache − Materialized views − Partitioned tables Faster planning Easier rule development ODBC driver Adapters for XXX, YYY
    14. 14. Applying Optiq to Drill 1. Enhance SQL 2. Query translation
    15. 15. Drill vs Traditional SQL SQL: − Flat data − Schema up front Drill: − Nested data (list & map) − No schema Wed like to write: − SELECT name, toppings[2] FROM donuts WHERE ppu > 0.6 Solution: ARRAY, MAP, ANY types
    16. 16. ARRAY & MAP SQL types ARRAY is like java.util.List MAP is like java.util.LinkedHashMapExamples: VALUES ARRAY [a, b, c] VALUES MAP [Washington, 1, Obama, 44] SELECT name, address[1], address[2], state FROM Employee SELECT * FROM Donuts WHERE CAST(donuts[ppu] AS DOUBLE) > 0.6
    17. 17. ANY SQL type ANY means “type to be determined at runtime” Validator narrows down possible type based on operators used Similar to converting Javas type system into JavaScripts. (Not easy.)
    18. 18. Sugaring the donutQuery: SELECT c[ppu], c[toppings][1] FROM DonutsAdditional syntactic sugar: c.x means c[x]So: CREATE TABLE Donuts(c ANY) SELECT c.ppu, c.toppings[1] FROM DonutsBetter:
    19. 19. UNNESTEmployees nested inside departments: CREATE TYPE employee (empno INT, name VARCHAR(30)); CREATE TABLE dept (deptno INT, name VARCHAR(30), employees EMPLOYEE ARRAY);Unnest: SELECT d.deptno, d.name, e.empno, e.name FROM department AS d CROSS JOIN UNNEST(d.employees) AS eSQL standard provides other operations on collections:
    20. 20. Applying Optiq to Drill 1. Enhance SQL2. Query translation
    21. 21. Query translationSQL: select d[name] as name, d[xx] as xx from ( select _MAP[donuts] as d from donuts) where cast(d[ppu] as double) > 0.6Drill: { head: { … }, storage: { … }, query: [ { op: “sequence”, do: [ { op: “scan”, … selection: { path:
    22. 22. Planner logOriginal rel:AbstractConverter(subset=[rel#14:Subset#3.AR RAY], convention=[ARRAY]) ProjectRel(subset=[rel#10:Subset#3.NONE], NAME=[ITEM($0, name)], XX=[ITEM($0, xx)]) FilterRel(subset=[rel#8:Subset#2.NONE], condition=[>(CAST(ITEM($0, ppu)):DOUBLE NOT NULL, 0.6)]) ProjectRel(subset=[rel#6:Subset#1.NONE], D=[ITEM($0, donuts)]) DrillScan(subset=[rel#4:Subset#0.DRILL],
    23. 23. Next Translate join, aggregate, sort, set ops Operator overloading with ANY Mondrian on Drill
    24. 24. Thank you!https://github.com/julianhyde/share/tree/master/slideshttps://github.com/julianhyde/incubator-drillhttps://github.com/julianhyde/optiqhttp://incubator.apache.org/drill@julianhyde
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×