SlideShare a Scribd company logo
Phoenix
James Taylor
@JamesPlusPlus
http://phoenix-hbase.blogspot.com/
We put the SQL back in NoSQL
https://github.com/forcedotcom/phoenix
Agenda
Completed
 What/why HBase?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
 Roadmap
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
 Roadmap
 Q&A
What is HBase?
Completed
 Developed as part of Apache Hadoop
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
Distributed
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
Distributed
Sparse
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distributed
Sparse
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distributed Consistent
Sparse
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distributed Consistent
Sparse Multidimensional
Cluster Architecture
Sharding
Why Use HBase?
Completed
 If you have lots of data
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without transactions
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without transactions
 If your data changes
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without transactions
 If your data changes
 If you need strict consistency
What is Phoenix?
Completed
What is Phoenix?
Completed
 SQL skin for HBase
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
 Compiles SQL into native HBase calls
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
 Compiles SQL into native HBase calls
 So you don’t have to!
Cluster Architecture
Cluster Architecture
Phoenix
Cluster Architecture
Phoenix
Phoenix
Phoenix Performance
Why Use Phoenix?
Why Use Phoenix?
Completed
 Give folks an API they already know
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
SELECT TRUNC(date,'DAY’), AVG(cpu)
FROM web_stat
WHERE domain LIKE 'Salesforce%’
GROUP BY TRUNC(date,'DAY’)
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
 Aggregation
 Skip Scan
 Secondary indexing (soon!)
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
 Leverage existing tooling
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
 Leverage existing tooling
 SQL client/terminal
 OLAP engine
How Does Phoenix Work?
Completed
 Overlays on top of HBase Data Model
 Keeps Versioned Schema Respository
 Query Processor
Phoenix Data Model
HBase Table
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Multiple Versions
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Phoenix Table
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Phoenix Table
Key Value Columns
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Phoenix Table
Key Value ColumnsRow Key Columns
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 SYSTEM.TABLE
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 CREATE TABLE
 ALTER TABLE
 DROP TABLE
 CREATE INDEX
 DROP INDEX
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
 Flashback queries use schema that was in-place then
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
 Accessible via JDBC metadata APIs
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
 Accessible via JDBC metadata APIs
 java.sql.DatabaseMetaData
 Through Phoenix queries!
Example
Row Key
SERVER METRICS
HOST VARCHAR
DATE DATE
RESPONSE_TIME INTEGER
GC_TIME INTEGER
CPU_TIME INTEGER
IO_TIME INTEGER
…
Over metrics data for clusters of servers with a schema like this:
Example
Over metrics data for clusters of servers with a schema like this:
Key Values
SERVER METRICS
HOST VARCHAR
DATE DATE
RESPONSE_TIME INTEGER
GC_TIME INTEGER
CPU_TIME INTEGER
IO_TIME INTEGER
…
With 90 days of data that looks like this:
SERVER METRICS
HOST DATE RESPONSE_TIME GC_TIME
sf1.s1 Jun5 10:10:10.234 1234
sf1.s1 Jun 5 11:18:28.456 8012
…
sf3.s1 Jun5 10:10:10.234 2345
sf3.s1 Jun 6 12:46:19.123 2340
sf7.s9 Jun 4 08:23:23.456 5002 1234
…
Example
Example
Walk through query processing for three scenarios
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
2. Identify 5 Longest GC Times
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
2. Identify 5 Longest GC Times
3. Identify 5 Longest GC Times again and again
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1
sf3
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1
sf3
sf7
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date >CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1 t1 - *
sf3
sf7
Step 2: Client
Overlay Row Key Ranges with Regions
Completed
R1
R2
R3
R4
sf1
sf4
sf6
sf1
sf3
sf7
Step 3: Client
Execute Parallel Scans
Completed
R1
R2
R3
R4
sf1
sf4
sf6
sf1
sf3
sf7
scan1
scan3
scan2
Step 4: Server
Filter using Skip Scan
Completed
sf1.s1 t0SKIP
Step 4: Server
Filter using Skip Scan
Completed
sf1.s1 t1INCLUDE
Step 4: Server
Filter using Skip Scan
Completed
sf1.s2 t0SKIP
Step 4: Server
Filter using Skip Scan
Completed
sf1.s2 t1
INCLUDE
Step 4: Server
Filter using Skip Scan
sf1.s3 t0SKIP
Step 4: Server
Filter using Skip Scan
sf1.s3 t1INCLUDE
SERVER METRICS
HOST DATE
sf1.s1 Jun 2 10:10:10.234
sf1.s2 Jun 3 23:05:44.975
sf1.s2 Jun 9 08:10:32.147
sf1.s3 Jun 1 11:18:28.456
sf1.s3 Jun 3 22:03:22.142
sf1.s4 Jun 1 10:29:58.950
sf1.s4 Jun 2 14:55:34.104
sf1.s4 Jun 3 12:46:19.123
sf1.s5 Jun 8 08:23:23.456
sf1.s6 Jun 1 10:31:10.234
Step 5: Server
Intercept Scan in Coprocessor
SERVER METRICS
HOST DATE AGG
sf1 Jun 1 …
sf1 Jun 2 …
sf1 Jun 3 …
sf1 Jun 8 …
sf1 Jun 9 …
Step 6: Client
Perform Final Merge Sort
Completed
R1
R2
R3
R4
scan1
scan3
scan2
SERVER METRICS
HOST DATE AGG
sf1 Jun5 …
sf1 Jun 9 …
sf3 Jun 1 …
sf3 Jun 2 …
sf7 Jun 1 …
sf7 Jun 8 …
Scenario 2
Find 5 Longest GC Times
Completed
SELECT host, date, gc_time
FROMserver_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5
Scenario 2
Find 5 Longest GC Times
• Same client parallelization and server skip scan filtering
Scenario 2
Find 5 Longest GC Times
Completed
• Same client parallelization and server skip scan filtering
• Server holds 5 longest GC_TIME value for each scan
R1
SERVER METRICS
HOST DATE GC_TIME
sf1.s1 Jun 2 10:10:10.234 22123
sf1.s1 Jun 3 23:05:44.975 19876
sf1.s1 Jun 9 08:10:32.147 11345
sf1.s2 Jun 1 11:18:28.456 10234
sf1.s2 Jun 3 22:03:22.142 10111
SERVER METRICS
HOST DATE GC_TIME
sf1.s1 Jun 2 10:10:10.234 22123
sf1.s1 Jun 3 23:05:44.975 19876
sf1.s1 Jun 9 08:10:32.147 11345
sf1.s2 Jun 1 11:18:28.456 10234
sf1.s2 Jun 3 22:03:22.142 10111
Scenario 2
Find 5 Longest GC Times
• Same client parallelization and server skip scan filtering
• Server holds 5 longest GC_TIME value for each scan
• Client performs final merge sort among parallel scans
Scan1
Scan2
Scan3
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics(gc_time DESC, date DESC)
INCLUDE (host, response_time)
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Row Key
GC_TIME_INDEX
GC_TIME INTEGER
DATE DATE
HOST VARCHAR
RESPONSE_TIME INTEGER
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Key Value
GC_TIME_INDEX
GC_TIME INTEGER
DATE DATE
HOST VARCHAR
RESPONSE_TIME INTEGER
Scenario 3
Find 5 Longest GC Times
Completed
SELECT host, date, gc_time
FROMserver_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5
Demo
Completed
 Phoenix Stock Analyzer
 Fortune 500 companies
 10 years of historical stock prices
 Demonstrates Skip Scan in action
 Running locally on my single node
laptop cluster
Phoenix Roadmap
Completed
 Secondary Indexing
 Count distinct and percentile
 Derived tables
 Hash Joins
 Apache Drill integration
 Cost-based query optimizer
 OLAP extensions
 Transactions
Thank you!
Questions/comments?

More Related Content

What's hot

More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInMore Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
Celia Kung
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
Splunk
 
phoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupphoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupMaryann Xue
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
Chartio
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
URL Design with Lasso
URL Design with LassoURL Design with Lasso
URL Design with Lasso
macsolve
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Yahoo Developer Network
 
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Severalnines
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Metosin Oy
 
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsWebinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Severalnines
 
Optimizing Apache Spark UDFs
Optimizing Apache Spark UDFsOptimizing Apache Spark UDFs
Optimizing Apache Spark UDFs
Databricks
 
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep diveWebinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Severalnines
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
Josh Elser
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
Spark Summit
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
DataWorks Summit
 

What's hot (20)

More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInMore Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
 
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
 
phoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupphoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetup
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
URL Design with Lasso
URL Design with LassoURL Design with Lasso
URL Design with Lasso
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
 
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
 
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsWebinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
 
Optimizing Apache Spark UDFs
Optimizing Apache Spark UDFsOptimizing Apache Spark UDFs
Optimizing Apache Spark UDFs
 
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep diveWebinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 

Similar to Phoenix: How (and why) we put the SQL back into the NoSQL

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
Joseph Niemiec
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
Rohit Jain
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-finalMaryann Xue
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
Scott Cinnamond
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon
 
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
Vijaykumar Vangapandu
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Technical Overview on Cloudera Impala
Technical Overview on Cloudera ImpalaTechnical Overview on Cloudera Impala
Technical Overview on Cloudera Impala
Praneeth Krishna
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
HostedbyConfluent
 
HiveWarehouseConnector
HiveWarehouseConnectorHiveWarehouseConnector
HiveWarehouseConnector
Eric Wohlstadter
 
SAP S/4 HANA ONLINE TRAINING
SAP S/4 HANA ONLINE TRAININGSAP S/4 HANA ONLINE TRAINING
SAP S/4 HANA ONLINE TRAINING
Glory IT Technologies
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
Michael Stack
 
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine LifeSAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
Imagine life
 

Similar to Phoenix: How (and why) we put the SQL back into the NoSQL (20)

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-final
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
Technical Overview on Cloudera Impala
Technical Overview on Cloudera ImpalaTechnical Overview on Cloudera Impala
Technical Overview on Cloudera Impala
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 
HiveWarehouseConnector
HiveWarehouseConnectorHiveWarehouseConnector
HiveWarehouseConnector
 
SAP S/4 HANA ONLINE TRAINING
SAP S/4 HANA ONLINE TRAININGSAP S/4 HANA ONLINE TRAINING
SAP S/4 HANA ONLINE TRAINING
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine LifeSAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

Phoenix: How (and why) we put the SQL back into the NoSQL

  • 1. Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/ We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix
  • 4. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?
  • 5. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo
  • 6. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo  Roadmap
  • 7. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo  Roadmap  Q&A
  • 8. What is HBase? Completed  Developed as part of Apache Hadoop
  • 9. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS
  • 10. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store
  • 11. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map
  • 12. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Distributed
  • 13. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Distributed Sparse
  • 14. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Sparse
  • 15. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Consistent Sparse
  • 16. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Consistent Sparse Multidimensional
  • 19. Why Use HBase? Completed  If you have lots of data
  • 20. Why Use HBase? Completed  If you have lots of data  Scales linearly
  • 21. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically
  • 22. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions
  • 23. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions  If your data changes
  • 24. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions  If your data changes  If you need strict consistency
  • 26. What is Phoenix? Completed  SQL skin for HBase
  • 27. What is Phoenix? Completed  SQL skin for HBase  Alternate client API
  • 28. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver
  • 29. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed
  • 30. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed  Compiles SQL into native HBase calls
  • 31. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed  Compiles SQL into native HBase calls  So you don’t have to!
  • 37. Why Use Phoenix? Completed  Give folks an API they already know
  • 38. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed
  • 39. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed SELECT TRUNC(date,'DAY’), AVG(cpu) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY’)
  • 40. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently
  • 41. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Aggregation  Skip Scan  Secondary indexing (soon!)
  • 42. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Leverage existing tooling
  • 43. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Leverage existing tooling  SQL client/terminal  OLAP engine
  • 44. How Does Phoenix Work? Completed  Overlays on top of HBase Data Model  Keeps Versioned Schema Respository  Query Processor
  • 45. Phoenix Data Model HBase Table Phoenix maps HBase data model to the relational world
  • 46. Phoenix Data Model HBase Table Column Family A Column Family B Phoenix maps HBase data model to the relational world
  • 47. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Phoenix maps HBase data model to the relational world
  • 48. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Phoenix maps HBase data model to the relational world
  • 49. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Phoenix maps HBase data model to the relational world
  • 50. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  • 51. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  • 52. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  • 53. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Multiple Versions
  • 54. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table
  • 55. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table Key Value Columns
  • 56. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table Key Value ColumnsRow Key Columns
  • 57. Phoenix Metadata Completed  Stored in a Phoenix HBase table
  • 58. Phoenix Metadata Completed  Stored in a Phoenix HBase table  SYSTEM.TABLE
  • 59. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands
  • 60. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  CREATE TABLE  ALTER TABLE  DROP TABLE  CREATE INDEX  DROP INDEX
  • 61. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves
  • 62. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data
  • 63. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Flashback queries use schema that was in-place then
  • 64. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Accessible via JDBC metadata APIs
  • 65. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Accessible via JDBC metadata APIs  java.sql.DatabaseMetaData  Through Phoenix queries!
  • 66. Example Row Key SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER … Over metrics data for clusters of servers with a schema like this:
  • 67. Example Over metrics data for clusters of servers with a schema like this: Key Values SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …
  • 68. With 90 days of data that looks like this: SERVER METRICS HOST DATE RESPONSE_TIME GC_TIME sf1.s1 Jun5 10:10:10.234 1234 sf1.s1 Jun 5 11:18:28.456 8012 … sf3.s1 Jun5 10:10:10.234 2345 sf3.s1 Jun 6 12:46:19.123 2340 sf7.s9 Jun 4 08:23:23.456 5002 1234 … Example
  • 69. Example Walk through query processing for three scenarios
  • 70. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster
  • 71. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster 2. Identify 5 Longest GC Times
  • 72. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster 2. Identify 5 Longest GC Times 3. Identify 5 Longest GC Times again and again
  • 73. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 74. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 75. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 76. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 77. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 78. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  • 79. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  • 80. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  • 81. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1
  • 82. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 sf3
  • 83. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 sf3 sf7
  • 84. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date >CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 t1 - * sf3 sf7
  • 85. Step 2: Client Overlay Row Key Ranges with Regions Completed R1 R2 R3 R4 sf1 sf4 sf6 sf1 sf3 sf7
  • 86. Step 3: Client Execute Parallel Scans Completed R1 R2 R3 R4 sf1 sf4 sf6 sf1 sf3 sf7 scan1 scan3 scan2
  • 87. Step 4: Server Filter using Skip Scan Completed sf1.s1 t0SKIP
  • 88. Step 4: Server Filter using Skip Scan Completed sf1.s1 t1INCLUDE
  • 89. Step 4: Server Filter using Skip Scan Completed sf1.s2 t0SKIP
  • 90. Step 4: Server Filter using Skip Scan Completed sf1.s2 t1 INCLUDE
  • 91. Step 4: Server Filter using Skip Scan sf1.s3 t0SKIP
  • 92. Step 4: Server Filter using Skip Scan sf1.s3 t1INCLUDE
  • 93. SERVER METRICS HOST DATE sf1.s1 Jun 2 10:10:10.234 sf1.s2 Jun 3 23:05:44.975 sf1.s2 Jun 9 08:10:32.147 sf1.s3 Jun 1 11:18:28.456 sf1.s3 Jun 3 22:03:22.142 sf1.s4 Jun 1 10:29:58.950 sf1.s4 Jun 2 14:55:34.104 sf1.s4 Jun 3 12:46:19.123 sf1.s5 Jun 8 08:23:23.456 sf1.s6 Jun 1 10:31:10.234 Step 5: Server Intercept Scan in Coprocessor SERVER METRICS HOST DATE AGG sf1 Jun 1 … sf1 Jun 2 … sf1 Jun 3 … sf1 Jun 8 … sf1 Jun 9 …
  • 94. Step 6: Client Perform Final Merge Sort Completed R1 R2 R3 R4 scan1 scan3 scan2 SERVER METRICS HOST DATE AGG sf1 Jun5 … sf1 Jun 9 … sf3 Jun 1 … sf3 Jun 2 … sf7 Jun 1 … sf7 Jun 8 …
  • 95. Scenario 2 Find 5 Longest GC Times Completed SELECT host, date, gc_time FROMserver_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5
  • 96. Scenario 2 Find 5 Longest GC Times • Same client parallelization and server skip scan filtering
  • 97. Scenario 2 Find 5 Longest GC Times Completed • Same client parallelization and server skip scan filtering • Server holds 5 longest GC_TIME value for each scan R1 SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123 sf1.s1 Jun 3 23:05:44.975 19876 sf1.s1 Jun 9 08:10:32.147 11345 sf1.s2 Jun 1 11:18:28.456 10234 sf1.s2 Jun 3 22:03:22.142 10111
  • 98. SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123 sf1.s1 Jun 3 23:05:44.975 19876 sf1.s1 Jun 9 08:10:32.147 11345 sf1.s2 Jun 1 11:18:28.456 10234 sf1.s2 Jun 3 22:03:22.142 10111 Scenario 2 Find 5 Longest GC Times • Same client parallelization and server skip scan filtering • Server holds 5 longest GC_TIME value for each scan • Client performs final merge sort among parallel scans Scan1 Scan2 Scan3
  • 99. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics(gc_time DESC, date DESC) INCLUDE (host, response_time)
  • 100. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)
  • 101. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)
  • 102. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time) Row Key GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER
  • 103. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time) Key Value GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER
  • 104. Scenario 3 Find 5 Longest GC Times Completed SELECT host, date, gc_time FROMserver_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5
  • 105. Demo Completed  Phoenix Stock Analyzer  Fortune 500 companies  10 years of historical stock prices  Demonstrates Skip Scan in action  Running locally on my single node laptop cluster
  • 106. Phoenix Roadmap Completed  Secondary Indexing  Count distinct and percentile  Derived tables  Hash Joins  Apache Drill integration  Cost-based query optimizer  OLAP extensions  Transactions