SlideShare a Scribd company logo
Apache
Phoenix
Put the SQL back in NoSQL
1
Osama Hussein, March 2021
Agenda
● History
● Overview
● Architecture
2
● Capabilities
● Code
● Scenarios
1.
History
From open-source repo to top Apache project
Overview (Apache Phoenix)
4
● Began as an internal project by the
company (salesforce.com).
MAY 2014
JAN 2014
A Top-Level Apache
Project
Orignially Open-
Sourced on Github
2.
Overview
UDF, Transactions and Schema
Overview (Apache Phoenix)
6
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Developed as part of Apache Hadoop.
● Runs on top of Hadoop Distributed File System (HDFS).
● HBase scales linearly and shards automatically.
Overview (Apache Phoenix)
7
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Apache Phoenix is an add-on for Apache HBase that provides a
programmatic ANSI SQL interface.
● implements best-practice optimizations to enable software
engineers to develop next-generation data-driven applications
based on HBase.
● Create and interact with tables in the form of typical DDL/DML
statements using the standard JDBC API.
Overview (Apache Phoenix)
8
● Written in Java and SQL
● Atomicity, Consistency, Isolation and
Durability (ACID)
● Fully integrated with other Hadoop
products such as Spark, Hive, Pig, Flume,
and Map Reduce.
Overview (Apache Phoenix)
9
● included in
○ Cloudera Data Platform 7.0 and above.
○ Hortonworks distribution for HDP 2.1
and above.
○ Available as part of Cloudera labs.
○ Part of the Hadoop ecosystem.
Overview (SQL Support)
10
● Compiles SQL to and orchestrate running
of HBase scans.
● Produces JDBC result set.
● All standard SQL query constructs are
supported.
Overview (SQL Support)
11
● Direct use of the HBase API, along with
coprocessors and custom filters.
Performance:
○ Milliseconds for small queries
○ Seconds for tens of millions of rows.
Overview (Bulk Loading)
12
● MapReduce-based :
○ CSV and JSON
○ Via Phoenix
○ MapReduce library
● Single-Threaded:
○ CSV
○ Via PostgreSQL (PSQL)
○ HBase on local machine
Overview (User Defintion Functions)
13
● Temporary UDFs for sessions only.
● Permanent UDFs stored in system functions.
● UDF used in SQL and indexes.
● Tenant specific UDF usage and support.
● UDF jar update require cluster bounce.
Overview (Transactions)
14
● Using Apache Tephra cross row/table/ACID support.
● Create tables with flag ‘transactional=true’.
● Enable transactions and snapshot directory and set
timeout value ‘hbase-site.xml’.
● Transactions start with statement against table.
● Transactions end with commit or rollback.
Overview (Transactions)
15
● Applications let HBase manage timestamps.
● Incase the application needs to control the timestamp
‘CurrentSCN’ property must be specified at the
connection time.
● ‘CurrentSCN’ controls the timestamp for any DDL,
DML, or query.
Overview (Schema)
16
● The table metadata is stored in versioned HBase table
(Up to 1000 versions).
● ‘UPDATE_CACHE_FREQUENCY’ allow the user to
declare how often the server will be checked for meta
data updates. Values:
○ Always
○ Never
○ Millisecond value
Overview (Schema)
17
● Phoenix table can be:
○ Built from scratch.
○ Mapped to an existing HBase table.
■ Read-Write Table
■ Read-Only View
Overview (Schema)
18
 Read-Write Table:
○ column families will be created automatically if they
don’t already exist.
○ An empty key value will be added to the first column
family of each existing row to minimize the size of
the projection for queries.
Overview (Schema)
19
 Read-Only View:
○ All column families must already exist.
○ Addition of the Phoenix coprocessors used for query
processing (Only change to HBase table).
3.
Architecture
Architecture, Phoenix Data Mode, Query Execution
and Enviroment
Architecture
21
Architecture
22
Architecture (Phoenix Data Model)
23
Architecture (Server Metrics Example)
24
Architecture (Server Metrics Example)
25
● Example:
26
Overlay Row Key
Query
Perform Merge Sort
Skip Filtering
Scan Interception
Execute Scan
Perform Final Merge Sort
Intercept Scan in Coprocessor
Filter using Skip Scan
Execute Parallel Scans
Overlay Row Key Ranges with Regions
Identify Row Key Ranges from Query
Architecture (Query Execution)
Architecture (Enviroment)
27
Data Warehouse
Extract, Transform,
Load (ETL)
BI and Visualizing
4.
Code
Commands and Sample Codes
Code (Commands)
29
● DML Commands:
○ UPSERT VALUES
○ UPSERT SELECT
○ DELETE
● DDL Commands:
○ CREATE TABLE
○ CREATE VIEW
○ Drop Table
○ Drop View
30
Connection:
● Long Running
● Short Running Connection conn =
DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”,
longRunningProps);
Connection conn =
DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning",
shortRunningProps);
31
@Test
public void createTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL
PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets;
Connection conn = DriverManager.getConnection(getUrl());
conn.createStatement().execute(ddl);
}
Transactions:
● Create Table
32
@Test
public void readTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
long numRows = 1000;
long numExpectedTasks = numSaltBuckets;
insertRowsInTable(tableName, numRows);
String query = "SELECT * FROM " + tableName;
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);
PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class);
changeInternalStateForTesting(resultSetBeingTested);
while (resultSetBeingTested.next()) {}
resultSetBeingTested.close();
Set<String> expectedTableNames = Sets.newHashSet(tableName);
assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows),
Lists.newArrayList(numExpectedTasks),
resultSetBeingTested, expectedTableNames);
}
Transactions:
● Read Table
33
@Override
public void getRowCount(ResultSet resultSet) throws SQLException {
Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow();
Cell kv = row.getValue(0);
ImmutableBytesWritable tmpPtr = new
ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(),
kv.getValueLength());
// A single Cell will be returned with the count(*) - we decode that here
rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr,
SortOrder.getDefault());
}
Transactions:
● Row Count
34
private void changeInternalStateForTesting(PhoenixResultSet rs) {
// get and set the internal state for testing purposes.
ReadMetricQueue testMetricsQueue = new
TestReadMetricsQueue(LogLevel.OFF,true);
StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs,
"context");
Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue);
Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue);
}
Transactions:
● Internal State
5.
Capabilities
Features and Capabilities
Capabilities
● Overlays on top of HBase Data Model
● Keeps Versioned Schema Respository
● Query Processor
36
Capabilities
● Cost-based query optimizer.
● Enhance existing statistics collection.
● Generate histograms to drive query
optimization decisions and join ordering.
37
Capabilities
● Secondary indexes:
● Boost the speed of queries without relying
on specific row-key designs.
● Enable users to use star schemes.
● Leverage SQL tools and Online Analytics 38
Capabilities
● Row timestamp column.
● Set minimum and maximum time range
for scans.
● Improves performance especially when
querying the tail-end of the data.
39
5.
Scenarios
Use Cases
Scenarios (Server Metrics Example)
41
SELECT substr(host,1,3), trunc(date,’DAY’),
avg(response_time) FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
42
Scenarios (Chart Response Time Per Cluster)
SELECT host, date, gc_time
FROM server_metrics WHERE date >
CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5 43
Scenarios (Find 5 Longest GC Times )
Thanks!
Any questions?
You can find me at:
Github: @sxaxmz
Linkedin: linkedin.com/in/husseinosama
44

More Related Content

What's hot

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
Rajeshbabu Chintaguntla
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
Rommel Garcia
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
DataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
DataWorks Summit
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixStrongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache Phoenix
YugabyteDB
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 

What's hot (20)

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
HBase
HBaseHBase
HBase
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixStrongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache Phoenix
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 

Similar to Apache phoenix

Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Qbeast
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
MariaDB plc
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Fabian Hueske
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
Michael Mior
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
HBaseCon
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules
Itamar Haber
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
Abhinav Singh
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
Richard Kuo
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
 

Similar to Apache phoenix (20)

Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Java one2013
Java one2013Java one2013
Java one2013
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

Apache phoenix

  • 1. Apache Phoenix Put the SQL back in NoSQL 1 Osama Hussein, March 2021
  • 2. Agenda ● History ● Overview ● Architecture 2 ● Capabilities ● Code ● Scenarios
  • 3. 1. History From open-source repo to top Apache project
  • 4. Overview (Apache Phoenix) 4 ● Began as an internal project by the company (salesforce.com). MAY 2014 JAN 2014 A Top-Level Apache Project Orignially Open- Sourced on Github
  • 6. Overview (Apache Phoenix) 6 Lorem ipsum congue tempus Support for late-bound, schema-on- read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive ● Developed as part of Apache Hadoop. ● Runs on top of Hadoop Distributed File System (HDFS). ● HBase scales linearly and shards automatically.
  • 7. Overview (Apache Phoenix) 7 Lorem ipsum congue tempus Support for late-bound, schema-on- read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive ● Apache Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. ● implements best-practice optimizations to enable software engineers to develop next-generation data-driven applications based on HBase. ● Create and interact with tables in the form of typical DDL/DML statements using the standard JDBC API.
  • 8. Overview (Apache Phoenix) 8 ● Written in Java and SQL ● Atomicity, Consistency, Isolation and Durability (ACID) ● Fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.
  • 9. Overview (Apache Phoenix) 9 ● included in ○ Cloudera Data Platform 7.0 and above. ○ Hortonworks distribution for HDP 2.1 and above. ○ Available as part of Cloudera labs. ○ Part of the Hadoop ecosystem.
  • 10. Overview (SQL Support) 10 ● Compiles SQL to and orchestrate running of HBase scans. ● Produces JDBC result set. ● All standard SQL query constructs are supported.
  • 11. Overview (SQL Support) 11 ● Direct use of the HBase API, along with coprocessors and custom filters. Performance: ○ Milliseconds for small queries ○ Seconds for tens of millions of rows.
  • 12. Overview (Bulk Loading) 12 ● MapReduce-based : ○ CSV and JSON ○ Via Phoenix ○ MapReduce library ● Single-Threaded: ○ CSV ○ Via PostgreSQL (PSQL) ○ HBase on local machine
  • 13. Overview (User Defintion Functions) 13 ● Temporary UDFs for sessions only. ● Permanent UDFs stored in system functions. ● UDF used in SQL and indexes. ● Tenant specific UDF usage and support. ● UDF jar update require cluster bounce.
  • 14. Overview (Transactions) 14 ● Using Apache Tephra cross row/table/ACID support. ● Create tables with flag ‘transactional=true’. ● Enable transactions and snapshot directory and set timeout value ‘hbase-site.xml’. ● Transactions start with statement against table. ● Transactions end with commit or rollback.
  • 15. Overview (Transactions) 15 ● Applications let HBase manage timestamps. ● Incase the application needs to control the timestamp ‘CurrentSCN’ property must be specified at the connection time. ● ‘CurrentSCN’ controls the timestamp for any DDL, DML, or query.
  • 16. Overview (Schema) 16 ● The table metadata is stored in versioned HBase table (Up to 1000 versions). ● ‘UPDATE_CACHE_FREQUENCY’ allow the user to declare how often the server will be checked for meta data updates. Values: ○ Always ○ Never ○ Millisecond value
  • 17. Overview (Schema) 17 ● Phoenix table can be: ○ Built from scratch. ○ Mapped to an existing HBase table. ■ Read-Write Table ■ Read-Only View
  • 18. Overview (Schema) 18  Read-Write Table: ○ column families will be created automatically if they don’t already exist. ○ An empty key value will be added to the first column family of each existing row to minimize the size of the projection for queries.
  • 19. Overview (Schema) 19  Read-Only View: ○ All column families must already exist. ○ Addition of the Phoenix coprocessors used for query processing (Only change to HBase table).
  • 20. 3. Architecture Architecture, Phoenix Data Mode, Query Execution and Enviroment
  • 25. Architecture (Server Metrics Example) 25 ● Example:
  • 26. 26 Overlay Row Key Query Perform Merge Sort Skip Filtering Scan Interception Execute Scan Perform Final Merge Sort Intercept Scan in Coprocessor Filter using Skip Scan Execute Parallel Scans Overlay Row Key Ranges with Regions Identify Row Key Ranges from Query Architecture (Query Execution)
  • 27. Architecture (Enviroment) 27 Data Warehouse Extract, Transform, Load (ETL) BI and Visualizing
  • 29. Code (Commands) 29 ● DML Commands: ○ UPSERT VALUES ○ UPSERT SELECT ○ DELETE ● DDL Commands: ○ CREATE TABLE ○ CREATE VIEW ○ Drop Table ○ Drop View
  • 30. 30 Connection: ● Long Running ● Short Running Connection conn = DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”, longRunningProps); Connection conn = DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning", shortRunningProps);
  • 31. 31 @Test public void createTable() throws Exception { String tableName = generateUniqueName(); long numSaltBuckets = 6; String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets; Connection conn = DriverManager.getConnection(getUrl()); conn.createStatement().execute(ddl); } Transactions: ● Create Table
  • 32. 32 @Test public void readTable() throws Exception { String tableName = generateUniqueName(); long numSaltBuckets = 6; long numRows = 1000; long numExpectedTasks = numSaltBuckets; insertRowsInTable(tableName, numRows); String query = "SELECT * FROM " + tableName; Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(query); PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class); changeInternalStateForTesting(resultSetBeingTested); while (resultSetBeingTested.next()) {} resultSetBeingTested.close(); Set<String> expectedTableNames = Sets.newHashSet(tableName); assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows), Lists.newArrayList(numExpectedTasks), resultSetBeingTested, expectedTableNames); } Transactions: ● Read Table
  • 33. 33 @Override public void getRowCount(ResultSet resultSet) throws SQLException { Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow(); Cell kv = row.getValue(0); ImmutableBytesWritable tmpPtr = new ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(), kv.getValueLength()); // A single Cell will be returned with the count(*) - we decode that here rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr, SortOrder.getDefault()); } Transactions: ● Row Count
  • 34. 34 private void changeInternalStateForTesting(PhoenixResultSet rs) { // get and set the internal state for testing purposes. ReadMetricQueue testMetricsQueue = new TestReadMetricsQueue(LogLevel.OFF,true); StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs, "context"); Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue); Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue); } Transactions: ● Internal State
  • 36. Capabilities ● Overlays on top of HBase Data Model ● Keeps Versioned Schema Respository ● Query Processor 36
  • 37. Capabilities ● Cost-based query optimizer. ● Enhance existing statistics collection. ● Generate histograms to drive query optimization decisions and join ordering. 37
  • 38. Capabilities ● Secondary indexes: ● Boost the speed of queries without relying on specific row-key designs. ● Enable users to use star schemes. ● Leverage SQL tools and Online Analytics 38
  • 39. Capabilities ● Row timestamp column. ● Set minimum and maximum time range for scans. ● Improves performance especially when querying the tail-end of the data. 39
  • 42. SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) 42 Scenarios (Chart Response Time Per Cluster)
  • 43. SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5 43 Scenarios (Find 5 Longest GC Times )
  • 44. Thanks! Any questions? You can find me at: Github: @sxaxmz Linkedin: linkedin.com/in/husseinosama 44

Editor's Notes

  1. Apache Phoenix -> A scale-out RDBMS with evolutionary schema built on Apache HBase
  2. Internal project out of a need to support a higher level, well understood, SQL language.
  3. Apache HBase -> open-source non-relational distributed database modeled after Google's Bigtable and written in Java. Used to have random, real-time read/write access to Big Data. column-oriented, NoSQL database built on top of Hadoop.
  4. Apache Phoenix -> Open source massively parallel relational database engine supporting database for Online Transactional Processing (OLTP) and operational analytics in Hadoop. Provides JDB browser enabling users to create, delete and alter SQL tables, view instances indexes and querying data through SQL. Apache phoenix is a relational layer over Hbase. SQL skin for Hbase. Provides a JDBC driver that hides the intricacies of the noSQL
  5.  ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. All changes to data are performed as if they are a single operation. 1. Atomicity preserves the “completeness” of the business process (all or nothing behavior) 2. Consistency refers to the state of the data both before and after the transaction is executed (Use transaction maintains the consistency of the state of the data) 3. Isolation means that transactions can run at the same time., there is no concurrency (locking mechanism is required) 4. Durability refers to the impact of an outage or a failure on a running transaction (data survives any failures) To summarize, a transaction will either complete, producing correct results, or terminate, with no effect.
  6. Bulk loading for tables created in phoenix is easier compared to tables created in HBase shell.
  7. (Server Bounce) Adminstrator/Technician removes power to the device in a "non-controlled shutdown.“ The "down" part of the bounce. Once the server is completely off, and all activity has ceased, the administrator restarts the server.
  8. Set phoenix.transactions.enabled property to true along with running transaction manager (included in distribution) to enable full ACID transactions (Tables may optionally be declared as transactionaltable may optionally be declared as transactional). A concurrency model is used to detect row level conflicts with first commit wins semantics. The later commit would produce an exception indicating that a conflict was detected. A transaction is started implicitly when a transactional table is referenced in a statement. at which no updates can be seen from other connections until either a commit or rollback occurs. A non transactional tables will not see their updates until after a commit has occurred. 
  9. Phoenix uses the value of this connection property as the max timestamp of scans. Timestamps may not be controlled for transactional tables. Instead, the transaction manager assigns timestamps which become the HBase cell timestamps after a commit. Timestamps are multiplied by 1,000,000 to ensure enough granularity for uniqueness across the cluster.
  10. Snapshot queries over older data will pick up and use the correct schema based on the time of connection (Based on CurrentSCN). Data updates such as addition or removal of a table column or the updates of table statistics. 1. ALWAYS value will cause the client to check with the server each time a statement is executed that references a table  (or once per commit for an UPSERT VALUES statement. 2. Millisecond value indicates how long the client will hold on to its cached version of the metadata before checking back with the server for updates.
  11. From scratch -> HBase table and column families will be created automatically. Mapped to existing -> The binary representation of the row key and key values must match that of the Phoenix data types
  12. 1. The primary use case for a VIEW is to transfer existing data into a Phoenix table. A table could also be declared as salted to prevent HBase region hot spotting.  The table catalog argument in the metadata APIs is used to filter based on the tenant ID for multi-tenant tables. 2. since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE. Phoenix supports updatable views on top of tables with the unique feature leveraging the schemaless capabilities of HBase of being able to add columns to them. All views all share the same underlying physical HBase table and may even be indexed independently. A multi-tenant view may add columns which are defined solely for that user.
  13. 1. The primary use case for a VIEW is to transfer existing data into a Phoenix table. A table could also be declared as salted to prevent HBase region hot spotting.  The table catalog argument in the metadata APIs is used to filter based on the tenant ID for multi-tenant tables. 2. since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE. Phoenix supports updatable views on top of tables with the unique feature leveraging the schemaless capabilities of HBase of being able to add columns to them. All views all share the same underlying physical HBase table and may even be indexed independently. A multi-tenant view may add columns which are defined solely for that user.
  14. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  15. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  16. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  17. ETL is a type of data integration that refers to the three steps used to blend data from multiple sources. It's often used to build a data warehouse.
  18. Data Manipulation Language (DML). Data Definition Language (DDL). For CREATE TABLE: 1. Any HBase metadata (table, column families) that doesn’t already exist will be created. 2. KEEP_DELETED_CELLS option is enabled to allow for flashback queries to work correctly. 3. an empty key value will also be added for each row so that queries behave as expected (without requiring all columns to be projected during scans).  For CREATE VIEW: Instead the existing HBase metadata must match the metadata specified in the DDL statement (or table read only error). For UPSERT VALUES: Use It multiple times before comminting mutations batching For UPSERT SELECT: Configure phoenix.mutate.batchSize based on row size Write scans directly to Hbase and to write on the server while running upsert select on the same table by setting auto-commit to true
  19. Enhance existing statistics collection by enabling further query optmizations based on the size and cardinality of the data. Generate histograms to drive query optimization decisions such as secondary index usage and join ordering based on cardinalities to produce the most efficient query plan.
  20. Secondary Indexies Types: Global Index (Optimized for read heavy use case), local index (Optimized for write heavy space constrained use cases) and functional index (Create index on arbitrary expression). Hbase tables are sorted maps. Star schema is the simplest style of data mart schema (separates business process data into facts), approach is widely used to develop data warehouses and dimensional data mart. The star schema consists of one or more fact tables referencing any number of dimension tables. Fact table contains measurements, metrics, and facts about a business process while the Dimension table is a companion to the fact table which contains descriptive attributes to be used as query constraining Types of Dimension Table: Slowly Changing Dimension, Conformed Dimension, Junk Dimension, Degenerate Dimension, Roleplay Dimension
  21. Maps Hbase native timestamp to a Phoenix column. Take advantage of various optimizations that HBase provides for time ranges. ROW_TIMESTAMP needs to be a primary key column in a date or time format (Specified in documentations for more details). Only one primary key can be designated with ROW_TIMESTAMP, decleration upon table creation (No null or negative values allowed).
  22. Cache content on server through 2 main parts (SQL Read, SQL Write) with end user and collecting content from content providers.
  23. Cache content on server through 2 main parts (SQL Read, SQL Write) with end user and collecting content from content providers.