Phoenix h basemeetup

Phoenix
We put the SQL back in the NoSQL
James Taylor
jtaylor@salesforce.com

Agenda
Phoenix Overview
Phoenix Implementation
Performance Analysis
Phoenix Roadmap
Demo Completed

Phoenix Overview
SQL layer on top of HBase
Delivered as a embedded JDBC driver
Targeting low latency queries over HBase data
Columns modeled as multi-part row key and key values
Query engine transforms SQL into series of scans
Using native HBase APIs and capabilities
Completed
Coprocessors for aggregation
Custom filters for expression evaluation
Transaction isolation through scan time range
Optionally client-controlled timestamps
Open sourcing soon
100% Java

Phoenix SQL Support
SELECT <expression>…
FROM <table>
WHERE <expression>
GROUP BY <expression>…
HAVING <aggregate expression>
ORDER BY <aggregate expression>…
LIMIT <value>
Aggregation Functions
 MIN, MAX, AVG, SUM, COUNT
Built-in Functions
 SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE
Operators
 =,!=,<>,<,<=,>,>=, LIKE
 AND, OR, NOT
Bind Parameters
 ?, :#
CASE WHEN
IN (<value>…)
DDL/DML (in progress)
 CREATE/DROP <table>
 DELETE FROM <table> WHERE <expression>
 UPSERT INTO <table> [(<column>…)]
VALUES (<value>…)

Sample Queries
SELECT host, TRUNC(dateTime, 'DAY'),
Completed
AVG(cache_hit), MIN(cache_hit), MAX(cache_hit)
FROM server_metrics
WHERE host LIKE 'cs11-%'
AND dateTime> TO_DATE('2012-04-01')
AND dateTime< TO_DATE('2012-07-01')
GROUP BY host, TRUNC(dateTime, 'DAY')
HAVING MIN(cache_hit) < 90
ORDER BY host, AVG(cache_hit)

SELECT product_number, product_name,
CASE
WHEN list_price = 0 THEN 'Mfg item - not for resale'
WHEN list_price < 50 THEN 'Under $50'
WHEN list_price >= 50 and list_price < 250 THEN 'Under $250'
WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000'
ELSE 'Over $1000'
END as price_category
FROM product_catalogue
WHERE product_category IN ('Camping', 'Hiking’)
AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)

Query Processing
Product Metrics HTable
Row Key ORG_ID DATE FEATURE

TXNS
Key Values IO_TIME
RESPONSE_TIME

Scan
SELECT feature, SUM(txns)  Start key: ORG_ID (:1) + DATE (:2)
FROM product_metrics  End key: ORG_ID (:1) + DATE (:3)
Filter
WHERE org_id = :1  Filter: IO_TIME > 100
AND date >= :2 Aggregation
AND date <= :3  Intercepts scan on region server
 Builds map of distinct FEATURE values
AND io_time > 100
 Returns one row per distinct group
GROUP BY feature  Client does final merge

Phoenix Query Optimizations
Start/stop key of scan based on AND-ed columns
Through SUBSTR, ROUND, TRUNC, LIKE
Parallelized on client by chunking over start/stop key of scan
Aggregation on region-servers through coprocessor
Inline for GROUP BY over row key ordered columns
In memory map per group otherwise
WHERE clause executed through custom filters
Completed
Incremental evaluation with early termination
Evaluated through byte pointers
IN and OR over same column (in progress)
Becomes batched get or filter with next row hint
Top N queries (future)
Through coprocessor keeping top N rows
TABLESAMPLE (future)
Becomes filter with next row hint

Phoenix Performance

Completed

Phoenix Roadmap
Increase breadth of SQL support
DML/DDL (in progress)
Derived tables (SELECT * FROM (SELECT foo FROM bar))
More built-in functions: COALESCE, UPPER, TRIM
More operators: ||, IS NULL, *,/,+,-
Secondary indexes
Multiple projections for immutable data
Reordered columns Completed
in row key
Different levels of aggregation
Incrementally maintained for non immutable data
TABLESAMPLE for sampling
Improve multi-byte support
Joins
Hash join
OLAP extensions
OVER
PARTITION BY

Demo
Completed

Time-series database charting
http://goo.gl/61WRs

Thank you!
Questions/comments?

Phoenix h basemeetup

More Related Content

What's hot

Similar to Phoenix h basemeetup

More from Dmitry Makarchuk

Phoenix h basemeetup

Editor's Notes