Phoenix
We put the SQL back in the NoSQL
James Taylor
jtaylor@salesforce.com
Agenda
Phoenix Overview
Phoenix Implementation
Performance Analysis
Phoenix Roadmap
Demo           Completed
Phoenix Overview
SQL layer on top of HBase
Delivered as a embedded JDBC driver
Targeting low latency queries over HBase data
Columns modeled as multi-part row key and key values
Query engine transforms SQL into series of scans
Using native HBase APIs and capabilities
                        Completed
    Coprocessors for aggregation
    Custom filters for expression evaluation
    Transaction isolation through scan time range
    Optionally client-controlled timestamps
Open sourcing soon
100% Java
Phoenix SQL Support
           SELECT <expression>…
           FROM <table>
           WHERE <expression>
           GROUP BY <expression>…
           HAVING <aggregate expression>
           ORDER BY <aggregate expression>…
           LIMIT <value>
Aggregation Functions
       MIN, MAX, AVG, SUM, COUNT
Built-in Functions
       SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE
Operators
       =,!=,<>,<,<=,>,>=, LIKE
       AND, OR, NOT
Bind Parameters
       ?, :#
CASE WHEN
IN (<value>…)
DDL/DML (in progress)
       CREATE/DROP <table>
       DELETE FROM <table> WHERE <expression>
       UPSERT INTO <table> [(<column>…)]
          VALUES (<value>…)
Sample Queries
SELECT host, TRUNC(dateTime, 'DAY'),
                               Completed
  AVG(cache_hit), MIN(cache_hit), MAX(cache_hit)
FROM server_metrics
WHERE host LIKE 'cs11-%'
AND dateTime> TO_DATE('2012-04-01')
AND dateTime< TO_DATE('2012-07-01')
GROUP BY host, TRUNC(dateTime, 'DAY')
HAVING MIN(cache_hit) < 90
ORDER BY host, AVG(cache_hit)

SELECT product_number, product_name,
   CASE
    WHEN list_price = 0 THEN 'Mfg item - not for resale'
    WHEN list_price < 50 THEN 'Under $50'
    WHEN list_price >= 50 and list_price < 250 THEN 'Under $250'
    WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000'
    ELSE 'Over $1000'
   END as price_category
FROM product_catalogue
WHERE product_category IN ('Camping', 'Hiking’)
AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)
Query Processing
                   Product Metrics HTable
     Row Key        ORG_ID       DATE           FEATURE

                                     TXNS
     Key Values                    IO_TIME
                               RESPONSE_TIME


                             Scan
SELECT feature, SUM(txns)      Start key: ORG_ID (:1) + DATE (:2)
FROM product_metrics           End key: ORG_ID (:1) + DATE (:3)
                             Filter
WHERE org_id = :1              Filter:    IO_TIME > 100
AND date >= :2               Aggregation
AND date <= :3                 Intercepts scan on region server
                               Builds map of distinct FEATURE values
AND io_time > 100
                               Returns one row per distinct group
GROUP BY feature               Client does final merge
Phoenix Query Optimizations
Start/stop key of scan based on AND-ed columns
    Through SUBSTR, ROUND, TRUNC, LIKE
Parallelized on client by chunking over start/stop key of scan
Aggregation on region-servers through coprocessor
    Inline for GROUP BY over row key ordered columns
    In memory map per group otherwise
WHERE clause executed through custom filters
                              Completed
    Incremental evaluation with early termination
    Evaluated through byte pointers
IN and OR over same column (in progress)
    Becomes batched get or filter with next row hint
Top N queries (future)
    Through coprocessor keeping top N rows
TABLESAMPLE (future)
    Becomes filter with next row hint
Phoenix Performance
Phoenix Performance




      Completed
Phoenix Roadmap
Increase breadth of SQL support
    DML/DDL (in progress)
    Derived tables (SELECT * FROM (SELECT foo FROM bar))
    More built-in functions: COALESCE, UPPER, TRIM
    More operators: ||, IS NULL, *,/,+,-
Secondary indexes
    Multiple projections for immutable data
       Reordered columns Completed
                              in row key
       Different levels of aggregation
    Incrementally maintained for non immutable data
TABLESAMPLE for sampling
Improve multi-byte support
Joins
    Hash join
OLAP extensions
    OVER
    PARTITION BY
Demo
              Completed



Time-series database charting
http://goo.gl/61WRs
Thank you!
Questions/comments?

Phoenix h basemeetup

  • 1.
    Phoenix We put theSQL back in the NoSQL James Taylor jtaylor@salesforce.com
  • 2.
    Agenda Phoenix Overview Phoenix Implementation PerformanceAnalysis Phoenix Roadmap Demo Completed
  • 3.
    Phoenix Overview SQL layeron top of HBase Delivered as a embedded JDBC driver Targeting low latency queries over HBase data Columns modeled as multi-part row key and key values Query engine transforms SQL into series of scans Using native HBase APIs and capabilities Completed Coprocessors for aggregation Custom filters for expression evaluation Transaction isolation through scan time range Optionally client-controlled timestamps Open sourcing soon 100% Java
  • 4.
    Phoenix SQL Support SELECT <expression>… FROM <table> WHERE <expression> GROUP BY <expression>… HAVING <aggregate expression> ORDER BY <aggregate expression>… LIMIT <value> Aggregation Functions  MIN, MAX, AVG, SUM, COUNT Built-in Functions  SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE Operators  =,!=,<>,<,<=,>,>=, LIKE  AND, OR, NOT Bind Parameters  ?, :# CASE WHEN IN (<value>…) DDL/DML (in progress)  CREATE/DROP <table>  DELETE FROM <table> WHERE <expression>  UPSERT INTO <table> [(<column>…)] VALUES (<value>…)
  • 5.
    Sample Queries SELECT host,TRUNC(dateTime, 'DAY'), Completed AVG(cache_hit), MIN(cache_hit), MAX(cache_hit) FROM server_metrics WHERE host LIKE 'cs11-%' AND dateTime> TO_DATE('2012-04-01') AND dateTime< TO_DATE('2012-07-01') GROUP BY host, TRUNC(dateTime, 'DAY') HAVING MIN(cache_hit) < 90 ORDER BY host, AVG(cache_hit) SELECT product_number, product_name, CASE WHEN list_price = 0 THEN 'Mfg item - not for resale' WHEN list_price < 50 THEN 'Under $50' WHEN list_price >= 50 and list_price < 250 THEN 'Under $250' WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000' ELSE 'Over $1000' END as price_category FROM product_catalogue WHERE product_category IN ('Camping', 'Hiking’) AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)
  • 6.
    Query Processing Product Metrics HTable Row Key ORG_ID DATE FEATURE TXNS Key Values IO_TIME RESPONSE_TIME Scan SELECT feature, SUM(txns)  Start key: ORG_ID (:1) + DATE (:2) FROM product_metrics  End key: ORG_ID (:1) + DATE (:3) Filter WHERE org_id = :1  Filter: IO_TIME > 100 AND date >= :2 Aggregation AND date <= :3  Intercepts scan on region server  Builds map of distinct FEATURE values AND io_time > 100  Returns one row per distinct group GROUP BY feature  Client does final merge
  • 7.
    Phoenix Query Optimizations Start/stopkey of scan based on AND-ed columns Through SUBSTR, ROUND, TRUNC, LIKE Parallelized on client by chunking over start/stop key of scan Aggregation on region-servers through coprocessor Inline for GROUP BY over row key ordered columns In memory map per group otherwise WHERE clause executed through custom filters Completed Incremental evaluation with early termination Evaluated through byte pointers IN and OR over same column (in progress) Becomes batched get or filter with next row hint Top N queries (future) Through coprocessor keeping top N rows TABLESAMPLE (future) Becomes filter with next row hint
  • 8.
  • 9.
  • 10.
    Phoenix Roadmap Increase breadthof SQL support DML/DDL (in progress) Derived tables (SELECT * FROM (SELECT foo FROM bar)) More built-in functions: COALESCE, UPPER, TRIM More operators: ||, IS NULL, *,/,+,- Secondary indexes Multiple projections for immutable data Reordered columns Completed in row key Different levels of aggregation Incrementally maintained for non immutable data TABLESAMPLE for sampling Improve multi-byte support Joins Hash join OLAP extensions OVER PARTITION BY
  • 11.
    Demo Completed Time-series database charting http://goo.gl/61WRs
  • 12.

Editor's Notes

  • #2 Demos: GOC demo – popups and filters Pulse – show Splunk dashboard and talk to the process – Shriman Stats.pl for GSI and SDA - Saran
  • #3 Add stories the team is planning to work on for the next sprint – List in priority order
  • #4 Add stories the team is planning to work on for the next sprint – List in priority order
  • #6 Add stories the team is planning to work on for the next sprint – List in priority order
  • #7 Add stories the team is planning to work on for the next sprint – List in priority order
  • #8 Add stories the team is planning to work on for the next sprint – List in priority order
  • #9 Add stories the team is planning to work on for the next sprint – List in priority order
  • #10 Add stories the team is planning to work on for the next sprint – List in priority order
  • #11 Add stories the team is planning to work on for the next sprint – List in priority order
  • #12 Add stories the team is planning to work on for the next sprint – List in priority order