Phoenix
James Taylor
@JamesPlusPlus
http://phoenix-hbase.blogspot.com/
We put the SQL back in NoSQL
https://github.com/for...
Agenda
Completed
 What/why HBase?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
 Roadmap
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
 Roadmap
 Q&A
What is HBase?
Completed
 Developed as part of Apache Hadoop
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
Distributed
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
Distributed
Spar...
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distribut...
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distribut...
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distribut...
Cluster Architecture
Sharding
Why Use HBase?
Completed
 If you have lots of data
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without tra...
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without tra...
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without tra...
What is Phoenix?
Completed
What is Phoenix?
Completed
 SQL skin for HBase
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed...
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed...
Cluster Architecture
Cluster Architecture
Phoenix
Cluster Architecture
Phoenix
Phoenix
Phoenix Performance
Why Use Phoenix?
Why Use Phoenix?
Completed
 Give folks an API they already know
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
SELECT TRUNC(date,'DAY...
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimization...
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimization...
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimization...
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimization...
How Does Phoenix Work?
Completed
 Overlays on top of HBase Data Model
 Keeps Versioned Schema Respository
 Query Proces...
Phoenix Data Model
HBase Table
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Phoenix maps HBase data...
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Phoenix...
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key...
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key...
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row ...
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row ...
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row ...
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key...
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key...
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key...
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 SYSTEM.TABLE
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 CREATE TABLE
 ALTER TABLE
...
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as sche...
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as sche...
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as sche...
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as sche...
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as sche...
Example
Row Key
SERVER METRICS
HOST VARCHAR
DATE DATE
RESPONSE_TIME INTEGER
GC_TIME INTEGER
CPU_TIME INTEGER
IO_TIME INTEG...
Example
Over metrics data for clusters of servers with a schema like this:
Key Values
SERVER METRICS
HOST VARCHAR
DATE DAT...
With 90 days of data that looks like this:
SERVER METRICS
HOST DATE RESPONSE_TIME GC_TIME
sf1.s1 Jun5 10:10:10.234 1234
sf...
Example
Walk through query processing for three scenarios
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
2. Identify 5 Longest GC Times
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
2. Identify 5 Longest GC Time...
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM s...
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM s...
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM s...
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM s...
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM s...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)...
Step 2: Client
Overlay Row Key Ranges with Regions
Completed
R1
R2
R3
R4
sf1
sf4
sf6
sf1
sf3
sf7
Step 3: Client
Execute Parallel Scans
Completed
R1
R2
R3
R4
sf1
sf4
sf6
sf1
sf3
sf7
scan1
scan3
scan2
Step 4: Server
Filter using Skip Scan
Completed
sf1.s1 t0SKIP
Step 4: Server
Filter using Skip Scan
Completed
sf1.s1 t1INCLUDE
Step 4: Server
Filter using Skip Scan
Completed
sf1.s2 t0SKIP
Step 4: Server
Filter using Skip Scan
Completed
sf1.s2 t1
INCLUDE
Step 4: Server
Filter using Skip Scan
sf1.s3 t0SKIP
Step 4: Server
Filter using Skip Scan
sf1.s3 t1INCLUDE
SERVER METRICS
HOST DATE
sf1.s1 Jun 2 10:10:10.234
sf1.s2 Jun 3 23:05:44.975
sf1.s2 Jun 9 08:10:32.147
sf1.s3 Jun 1 11:18:...
Step 6: Client
Perform Final Merge Sort
Completed
R1
R2
R3
R4
scan1
scan3
scan2
SERVER METRICS
HOST DATE AGG
sf1 Jun5 …
sf...
Scenario 2
Find 5 Longest GC Times
Completed
SELECT host, date, gc_time
FROMserver_metrics
WHERE date > CURRENT_DATE() – 7...
Scenario 2
Find 5 Longest GC Times
• Same client parallelization and server skip scan filtering
Scenario 2
Find 5 Longest GC Times
Completed
• Same client parallelization and server skip scan filtering
• Server holds 5...
SERVER METRICS
HOST DATE GC_TIME
sf1.s1 Jun 2 10:10:10.234 22123
sf1.s1 Jun 3 23:05:44.975 19876
sf1.s1 Jun 9 08:10:32.147...
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics(gc_time DESC, date DESC)
INCLUDE...
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUD...
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUD...
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUD...
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUD...
Scenario 3
Find 5 Longest GC Times
Completed
SELECT host, date, gc_time
FROMserver_metrics
WHERE date > CURRENT_DATE() – 7...
Demo
Completed
 Phoenix Stock Analyzer
 Fortune 500 companies
 10 years of historical stock prices
 Demonstrates Skip ...
Phoenix Roadmap
Completed
 Secondary Indexing
 Count distinct and percentile
 Derived tables
 Hash Joins
 Apache Dril...
Thank you!
Questions/comments?
Upcoming SlideShare
Loading in …5
×

Phoenix: How (and why) we put the SQL back into the NoSQL

1,682 views

Published on

Phoenix is an open source project from Salesforce.com that puts a SQL layer on top of HBase, a NoSQL store. This talk will focus on answering two questions: 1) why put a SQL layer on top of a NoSQL store? and 2) how does Phoenix marry the SQL paradigm back together with the NoSQL world? Salesforce.com uses relational database technology extensively, so a big part of the ?why? for us is to provide developers with a familiar API to read and write their data. Another reason is to provide a higher level abstraction such that performance optimizations can be done without impacting the client API. Two good examples are with filtering data and aggregating data. Both operations can be executed on the client or the server-side with equivalent functional results. However, the performance characteristics between these two choices is dramatically different. This leads to the second question, of ?how? to implement a SQL layer on top of a key/value store. Here we?ll demonstrate the openness of the HBase API and how it provides enough flexibility to support a higher level language such as SQL to be surfaced, while at the same time allowing us to obtain an order of magnitude performance improvement over prior attempts to do the same. We?ll focus on how HBase enables us to follow the principle of ?bringing the computation to the data? to achieve these dramatic performance improvements.

Published in: Technology

Phoenix: How (and why) we put the SQL back into the NoSQL

  1. 1. Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/ We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix
  2. 2. Agenda Completed  What/why HBase?
  3. 3. Agenda Completed  What/why HBase?  What/why Phoenix?
  4. 4. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?
  5. 5. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo
  6. 6. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo  Roadmap
  7. 7. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo  Roadmap  Q&A
  8. 8. What is HBase? Completed  Developed as part of Apache Hadoop
  9. 9. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS
  10. 10. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store
  11. 11. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map
  12. 12. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Distributed
  13. 13. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Distributed Sparse
  14. 14. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Sparse
  15. 15. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Consistent Sparse
  16. 16. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Consistent Sparse Multidimensional
  17. 17. Cluster Architecture
  18. 18. Sharding
  19. 19. Why Use HBase? Completed  If you have lots of data
  20. 20. Why Use HBase? Completed  If you have lots of data  Scales linearly
  21. 21. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically
  22. 22. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions
  23. 23. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions  If your data changes
  24. 24. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions  If your data changes  If you need strict consistency
  25. 25. What is Phoenix? Completed
  26. 26. What is Phoenix? Completed  SQL skin for HBase
  27. 27. What is Phoenix? Completed  SQL skin for HBase  Alternate client API
  28. 28. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver
  29. 29. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed
  30. 30. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed  Compiles SQL into native HBase calls
  31. 31. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed  Compiles SQL into native HBase calls  So you don’t have to!
  32. 32. Cluster Architecture
  33. 33. Cluster Architecture Phoenix
  34. 34. Cluster Architecture Phoenix Phoenix
  35. 35. Phoenix Performance
  36. 36. Why Use Phoenix?
  37. 37. Why Use Phoenix? Completed  Give folks an API they already know
  38. 38. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed
  39. 39. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed SELECT TRUNC(date,'DAY’), AVG(cpu) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY’)
  40. 40. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently
  41. 41. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Aggregation  Skip Scan  Secondary indexing (soon!)
  42. 42. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Leverage existing tooling
  43. 43. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Leverage existing tooling  SQL client/terminal  OLAP engine
  44. 44. How Does Phoenix Work? Completed  Overlays on top of HBase Data Model  Keeps Versioned Schema Respository  Query Processor
  45. 45. Phoenix Data Model HBase Table Phoenix maps HBase data model to the relational world
  46. 46. Phoenix Data Model HBase Table Column Family A Column Family B Phoenix maps HBase data model to the relational world
  47. 47. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Phoenix maps HBase data model to the relational world
  48. 48. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Phoenix maps HBase data model to the relational world
  49. 49. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Phoenix maps HBase data model to the relational world
  50. 50. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  51. 51. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  52. 52. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  53. 53. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Multiple Versions
  54. 54. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table
  55. 55. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table Key Value Columns
  56. 56. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table Key Value ColumnsRow Key Columns
  57. 57. Phoenix Metadata Completed  Stored in a Phoenix HBase table
  58. 58. Phoenix Metadata Completed  Stored in a Phoenix HBase table  SYSTEM.TABLE
  59. 59. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands
  60. 60. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  CREATE TABLE  ALTER TABLE  DROP TABLE  CREATE INDEX  DROP INDEX
  61. 61. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves
  62. 62. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data
  63. 63. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Flashback queries use schema that was in-place then
  64. 64. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Accessible via JDBC metadata APIs
  65. 65. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Accessible via JDBC metadata APIs  java.sql.DatabaseMetaData  Through Phoenix queries!
  66. 66. Example Row Key SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER … Over metrics data for clusters of servers with a schema like this:
  67. 67. Example Over metrics data for clusters of servers with a schema like this: Key Values SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …
  68. 68. With 90 days of data that looks like this: SERVER METRICS HOST DATE RESPONSE_TIME GC_TIME sf1.s1 Jun5 10:10:10.234 1234 sf1.s1 Jun 5 11:18:28.456 8012 … sf3.s1 Jun5 10:10:10.234 2345 sf3.s1 Jun 6 12:46:19.123 2340 sf7.s9 Jun 4 08:23:23.456 5002 1234 … Example
  69. 69. Example Walk through query processing for three scenarios
  70. 70. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster
  71. 71. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster 2. Identify 5 Longest GC Times
  72. 72. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster 2. Identify 5 Longest GC Times 3. Identify 5 Longest GC Times again and again
  73. 73. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  74. 74. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  75. 75. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  76. 76. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  77. 77. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  78. 78. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  79. 79. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  80. 80. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  81. 81. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1
  82. 82. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 sf3
  83. 83. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 sf3 sf7
  84. 84. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date >CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 t1 - * sf3 sf7
  85. 85. Step 2: Client Overlay Row Key Ranges with Regions Completed R1 R2 R3 R4 sf1 sf4 sf6 sf1 sf3 sf7
  86. 86. Step 3: Client Execute Parallel Scans Completed R1 R2 R3 R4 sf1 sf4 sf6 sf1 sf3 sf7 scan1 scan3 scan2
  87. 87. Step 4: Server Filter using Skip Scan Completed sf1.s1 t0SKIP
  88. 88. Step 4: Server Filter using Skip Scan Completed sf1.s1 t1INCLUDE
  89. 89. Step 4: Server Filter using Skip Scan Completed sf1.s2 t0SKIP
  90. 90. Step 4: Server Filter using Skip Scan Completed sf1.s2 t1 INCLUDE
  91. 91. Step 4: Server Filter using Skip Scan sf1.s3 t0SKIP
  92. 92. Step 4: Server Filter using Skip Scan sf1.s3 t1INCLUDE
  93. 93. SERVER METRICS HOST DATE sf1.s1 Jun 2 10:10:10.234 sf1.s2 Jun 3 23:05:44.975 sf1.s2 Jun 9 08:10:32.147 sf1.s3 Jun 1 11:18:28.456 sf1.s3 Jun 3 22:03:22.142 sf1.s4 Jun 1 10:29:58.950 sf1.s4 Jun 2 14:55:34.104 sf1.s4 Jun 3 12:46:19.123 sf1.s5 Jun 8 08:23:23.456 sf1.s6 Jun 1 10:31:10.234 Step 5: Server Intercept Scan in Coprocessor SERVER METRICS HOST DATE AGG sf1 Jun 1 … sf1 Jun 2 … sf1 Jun 3 … sf1 Jun 8 … sf1 Jun 9 …
  94. 94. Step 6: Client Perform Final Merge Sort Completed R1 R2 R3 R4 scan1 scan3 scan2 SERVER METRICS HOST DATE AGG sf1 Jun5 … sf1 Jun 9 … sf3 Jun 1 … sf3 Jun 2 … sf7 Jun 1 … sf7 Jun 8 …
  95. 95. Scenario 2 Find 5 Longest GC Times Completed SELECT host, date, gc_time FROMserver_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5
  96. 96. Scenario 2 Find 5 Longest GC Times • Same client parallelization and server skip scan filtering
  97. 97. Scenario 2 Find 5 Longest GC Times Completed • Same client parallelization and server skip scan filtering • Server holds 5 longest GC_TIME value for each scan R1 SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123 sf1.s1 Jun 3 23:05:44.975 19876 sf1.s1 Jun 9 08:10:32.147 11345 sf1.s2 Jun 1 11:18:28.456 10234 sf1.s2 Jun 3 22:03:22.142 10111
  98. 98. SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123 sf1.s1 Jun 3 23:05:44.975 19876 sf1.s1 Jun 9 08:10:32.147 11345 sf1.s2 Jun 1 11:18:28.456 10234 sf1.s2 Jun 3 22:03:22.142 10111 Scenario 2 Find 5 Longest GC Times • Same client parallelization and server skip scan filtering • Server holds 5 longest GC_TIME value for each scan • Client performs final merge sort among parallel scans Scan1 Scan2 Scan3
  99. 99. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics(gc_time DESC, date DESC) INCLUDE (host, response_time)
  100. 100. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)
  101. 101. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)
  102. 102. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time) Row Key GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER
  103. 103. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time) Key Value GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER
  104. 104. Scenario 3 Find 5 Longest GC Times Completed SELECT host, date, gc_time FROMserver_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5
  105. 105. Demo Completed  Phoenix Stock Analyzer  Fortune 500 companies  10 years of historical stock prices  Demonstrates Skip Scan in action  Running locally on my single node laptop cluster
  106. 106. Phoenix Roadmap Completed  Secondary Indexing  Count distinct and percentile  Derived tables  Hash Joins  Apache Drill integration  Cost-based query optimizer  OLAP extensions  Transactions
  107. 107. Thank you! Questions/comments?

×