Apache Phoenix: Use Cases and New Features

HBaseCon
Use Cases and New Features
@ApachePhoenix
http://phoenix.apache.org
V5
Agenda
• Phoenix Use Cases
– Argus: Time-series data with Phoenix (Tom Valine, Salesforce.com)
– Five major tips to maximize performance on a 200+ SQL HBase/Phoenix
cluster (Masayasu Suzuki, Sony)
– Phoenix & eHarmony, a perfect match (Vijay Vangapandu, eHarmony)
• What’s new in Phoenix
– ACID Transactions with Tephra (Poorna Chandra, Cask)
– Cost-based Query Optimization with Calcite (Maryann Xue, Intel)
• Q & A
– PhoenixCon tomorrow 9am-1pm @ Salesforce.com, 1 Market St, SF
Argus: Time-series data with Phoenix
Tom Valine
Salesforce.com
OpenTSDB Limitations
OpenTSDB is good, but we need more
•Tag Cardinality
– Total number of tags per metric is limited to 8
– Performance decreases drastically as tag values increase.
•UID Exhaustion
– Hard limit of 16M UIDs
•Ad hoc querying not possible
– Join to other data sources
– Joins of time series and events
– Simplification of Argus’ transform grammar
Phoenix-backed Argus TSDB Service
• 3 day hackathon
• Modeled metric as Phoenix VIEW
– Leverage ROW_TIMESTAMP optimization
• Tag values inlined in row key
– Uses SKIP_SCAN filter optimization
– Allows for secondary indexes on particular metric + tags
• Metric and tag names managed outside of data as metadata
• Eventually leverage Drillix (Phoenix + Drill)
– Cross cluster queries
– Joins to other data sources
Write Performance
Using 2 clients to write in parallel. Phoenix is using 10 writer threads per client
Read Performance
• Metrics with one tag (60 distinct values)
– OpenTSDB and Phoenix performance comparable for small aggregations
– Phoenix outperforms OpenTSDB as aggregation size increases
Disk usage
• Phoenix & OTSDB use approximately the same amount of space with FAST_DIFF
and Snappy compression
Five major tips to maximize
performance on a 200+ SQL
HBase/Phoenix cluster
Masayasu “Mas” Suzuki
Shinji Nagasaka
Takanari Tamesue
Sony Corporation
Who we are, and why we chose HBase/Phoenix
• We are DevOps members from
Sony’s News Suite team
http://socialife.sony.net/
• HBase/Phoenix was chosen
because of
a. Scalability,
b. SQL compatibility, and
c. secondary indexing support
Our use case
Performance test apparatus & results
• Test apparatus
• Test results
Specs
Number of records 1.2 billion records (1 KB each)
Number of indexes 8 orthogonal indexes
Servers
3 Zookeepers (Zookeeper 3.4.5, m3.xlarge x 3)
3 HMaster servers (hadoop 2.5.0, hbase 0.98.6, Phoenix 4.3.0, m3.xlarge x 3)
200 RegionServers
(hadoop 2.5.0, hbase 0.98.6, Phoenix 4.3.0, r3.xlarge x 199, c4.8xlarge x 1)
Clients 100 x c4.xlarge
Results
Number of queries 51,053 queries/sec
Response time (average) 46 ms
Five major tips to maximize performance
using HBase/Phoenix
Ordered by effectiveness (most effective on the very top)
– An extra RPC is issued when the client runs a SQL statement that uses a secondary index
– Using SQL hint clause can mitigate this
– From Ver. 4.7, changing “UPDATE_CACHE_FREQUENCY” may also work (we have yet to test this)
– A memory rich node should be selected for use in RegionServers so as to minimize disk access
– As an example, running major compaction and index creation simultaneously should be avoided
Details will be presented at the PhoenixCon tomorrow (May 25)
2. Use memories aggressively
1. Use SQL hint clause when using a secondary index
4. Scale-out instead of scale-up
3. Manually split Region files if possible but never over split them
5. Avoid running power intensive tasks simultaneously
Vijay Vangapandu
Principal Platform Engineer
eHarmony and Phoenix a perfect match
NEED FOR
● Handling 30+ Million events during Batch Run
● Serving low latency queries on 16+ Billion records
75th% - 800MS 95th% - 2Sec 99th% - 4Sec
eHarmony and Phoenix a perfect match
LAMBDA FOR THE SAVE
• Layered architecture provides fault tolerance
• Hbase as batch storage for write throughput with reasonable read latency
• Apache Phoenix as query layer to work with complex queries with confidence
• Redis as speed layer cache
eHarmony and Phoenix a perfect match
PERFORMANCE
Phoenix/HBase goes live
Get Matches API Response Times
Phoenix/HBase goes live
Save Match API Response Times
eHarmony and Phoenix a perfect match
• Highly Consistent and fault tolerant
• Need for store level filtering and sorting
• Apache Phoenix helped us build an abstract high performance
query layer on top of Hbase.
• Eased the development process.
• Reduced boiler plate code, which provides maintainability.
• Build complex queries with confidence.
• Secondary indexes.
• JDBC connection.
• Good community support
WHY HBASE AND PHOENIX
HBASE
APACHE PHOENIX
eHarmony and Phoenix a perfect match
JAVA ORM LIBRARY(PHO)
• Apache Phoenix helped us build PHO (Phoenix-HBase ORM)
• PHO provides ability to annotate your entity bean and provides interfaces
to build DSL like queries.
Disjunction disjunction = new Disjunction();
for (int statusFilter : statusFilters) {
disjunction.add(Restrictions.eq("status", statusFilter));
}
QueryBuilder.builderFor(FeedItemDto.class).select()
.add(Restrictions.eq("userId", userId))
.add(Restrictions.gte("spotlightEnd", spotlightEndDate))
.add(disjunction)
.setReturnFields(projection)
.addOrder(orderings)
.setMaxResults(maxResults)
.build();
eHarmony and Phoenix a perfect match
http://eharmony.github.io/
OPEN SOURCE REPOSITORY
https://github.com/eHarmony/pho
http://www.eharmony.com/about/careers/
*Please Join us for more details at PhoenixCon tomorrow (May 25)
ACID Transactions
+
Poorna Chandra
Cask
Why Transactions?
• All or none semantics simplifies life of
developer
– Ensures every client has a consistent view of data
– Protects against concurrent updates
– No need to reason about what state data is left in
if write fails
– Guaranteed consistency between data and index
Apache Tephra
• Transactions on HBase
– Across regions, tables and RPC calls
• ACID semantics
• Tephra Powers
– CDAP (Cask Data Application Platform)
– Apache Phoenix (4.7 onwards)
Apache Tephra Architecture
Zookeeper
Tx Manager
(standby)
HBase
Master 1 RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager
(active)
Master 2
Tephra Components
• TransactionAware client
• Coordinates transaction lifecycle with manager
• Communicates directly with HBase for reads and writes
• Transaction Manager
• Assigns transaction IDs
• Maintains state on in-progress, committed and invalid transactions
• Transaction Processor coprocessor
• Applies server-side filtering for reads
• Cleans up data from failed transactions, and no longer visible versions
Snapshot Isolation
• Multi-version concurrency control
– Cell version (timestamp) = transaction ID
– Reads exclude other uncommitted transactions (for
isolation)
• Optimistic Concurrency Control
– Avoids cost of locking rows and tables
– Good if conflicts are rare: short transaction, disjoint
partitioning of work
Single client using 10 threads in parallel with 5K batch size
No performance penalty for non-transactional tables
Performance
Concurrent Write Performance
2 write threads per client, 1000 row batch size, 15 columns table
Future Work
• Partitioned Transaction Manager
• Automatic pruning of invalid transaction list
• Read-only transactions
• Performance optimizations
• Conflict detection
• Appends to transaction edit log
+
Cost-based Query Optimization
Maryann Xue
Intel
Integration model
Calcite Parser & Validator
Calcite Query Optimizer
Phoenix Query Plan Generator
Phoenix Runtime
Phoenix Tables over HBase
JDBC Client
SQL + Phoenix
specific
grammar Built-in rules
+ Phoenix
specific rules
Cost-based query optimizer
with Apache Calcite
• Base all query optimization decisions on cost
– Filter push down; range scan vs. skip scan
– Hash aggregate vs. stream aggregate vs. partial stream aggregate
– Sort optimized out; sort/limit push through; fwd/rev/unordered scan
– Hash join vs. merge join; join ordering
– Use of data table vs. index table
– All above (any many others) COMBINED
• Query optimizations are modeled as pluggable rules
Beyond Phoenix 4.8
with Apache Calcite
• Get the missing SQL support
– WITH, UNNEST, Scalar subquery, etc.
• Materialized views
– To allow other forms of indices (maybe defined as external), e.g., a filter
view, a join view, or an aggregate view.
• Interop with other Calcite adaptors
– Already used by Drill, Hive, Kylin, Samza, etc.
– Supports any JDBC source
– Initial version of Drill-Phoenix integration already working
Query Example - no cost-based optimizer
select empid, e.name,
d.deptno, d.name,
location
from emps e, depts d
using deptno
order by e.deptno
Phoenix
Compiler
scan ‘depts’
send ‘depts’ over to RS
& build hash-cache
scan ‘emps’ hash-join ‘depts’
sort joined table on ‘e.deptno’
Query Example - with cost-based optimizer
(sort optimization combined with join algorithm decision)
LogicalSort
key: deptno
LogicalJoin
inner,
e.deptno = d.deptno
LogicalProject
empid, e.name, d.deptno,
d.name, location
LogicalTableScan
emps LogicalTableScan
depts
PhoenixTableScan
depts
PhoenixMergeJoin
inner,
e.deptno = d.deptno
PhoenixClientProject
empid, e.name, d.deptno,
d.name, location
Optimizer
Optimization rules
+
Phoenix operator
conversion rules
PhoenixTableScan
emps
PhoenixServerProject
empid, name, deptno
PhoenixServerProject
deptno, name, location
select empid, e.name, d.deptno,
d.name, location
from emps e, depts d using deptno
order by e.deptno
PhoenixServerSort
key: deptno
empid
empid
deptno
deptno
deptno
e.deptno;
d.deptno;
e.deptno;
d.deptno;
Query Example - with cost-based optimizer
(sort optimization combined with join algorithm decision)
Phoenix
Implementor
PhoenixTableScan
depts
PhoenixMergeJoin
inner,
e.deptno = d.deptno
PhoenixClientProject
empid, e.name, d.deptno,
d.name, location
PhoenixTableScan
emps
PhoenixServerProject
empid, name, deptno
PhoenixServerProject
deptno, name, location
PhoenixServerSort
key: deptno
empid
empid
deptno
deptno
deptno
e.deptno;
d.deptno;
e.deptno;
d.deptno;
scan ‘emps’
merge-join ‘emps’ and ‘depts’
sort by ‘deptno’
scan ‘depts’
Query Example - Comparison
Query plan w/o cost-based
optimizer
Query plan w/ cost-based optimizer
scan ‘emps’, ‘depts’ first ‘depts’, then ‘emps’ 2 tables in parallel
hash-cache send & build proportional to size of ‘depts’;
might cause exception if too large
none
hash-cache look-up 1 look-up per ‘emps’ row none
sorting sort ‘emps’ join ‘depts’ sort ‘emps’ only
optimization approach Local, serial optimization processes Cost-based, rule-driven, integrated
performance
(single node, 2M * 2K rows)
19.46 s 13.92 s
Drillix: Interoperability with Drill
select deptno, sum(salary) from emps group by deptno
Drill Final Aggregation
deptno, sum(salary)
Phoenix Table Scan
emps
Phoenix Tables over HBase
Drill Shuffle
Phoenix Partial Aggregation
deptno, sum(salary)
Stage 1:
Local Partial aggregation
Stage 3:
Final aggregation
Stage 2:
Shuffle partial results
Thank you! Questions?
Join us tomorrow for PhoenixCon
Salesforce.com, 1 Market St, SF 9am-1pm
(some companies using Phoenix)
1 of 39

Recommended

Apache phoenix by
Apache phoenixApache phoenix
Apache phoenixUniversity of Moratuwa
1.1K views19 slides
Apache Phoenix Query Server PhoenixCon2016 by
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Josh Elser
2.2K views19 slides
Apache phoenix: Past, Present and Future of SQL over HBAse by
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
6.3K views41 slides
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase by
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
3.3K views41 slides
Apache phoenix by
Apache phoenixApache phoenix
Apache phoenixOsama Hussein
162 views44 slides
The Evolution of a Relational Database Layer over HBase by
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
2.3K views34 slides

More Related Content

What's hot

HBaseCon2016-final by
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-finalMaryann Xue
323 views38 slides
Meet HBase 2.0 and Phoenix 5.0 by
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
2.4K views48 slides
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions by
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsMichael Stack
543 views22 slides
Apache Phoenix Query Server by
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query ServerJosh Elser
2.8K views19 slides
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse by
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
36.9K views43 slides
Practical Kerberos with Apache HBase by
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBaseJosh Elser
2.4K views32 slides

What's hot(20)

HBaseCon2016-final by Maryann Xue
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-final
Maryann Xue323 views
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions by Michael Stack
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
Michael Stack543 views
Apache Phoenix Query Server by Josh Elser
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
Josh Elser2.8K views
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse by Josh Elser
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser36.9K views
Practical Kerberos with Apache HBase by Josh Elser
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Josh Elser2.4K views
Apache HBase Internals you hoped you Never Needed to Understand by Josh Elser
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser2.1K views
HBase state of the union by enissoz
HBase   state of the unionHBase   state of the union
HBase state of the union
enissoz552 views
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ... by Yahoo Developer Network
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
ApacheCon 2020 - Flink SQL in 2020: Time to show off! by Timo Walther
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther334 views
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon646 views
Big data: Loading your data with flume and sqoop by Christophe Marchal
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Christophe Marchal8.4K views
Near-realtime analytics with Kafka and HBase by dave_revell
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
dave_revell23.6K views
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ... by StreamNative
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
StreamNative464 views
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i... by Michael Stack
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
Michael Stack592 views

Viewers also liked

Rolling Out Apache HBase for Mobile Offerings at Visa by
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
2.6K views39 slides
Apache HBase, Accelerated: In-Memory Flush and Compaction by
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
2.5K views36 slides
Keynote: The Future of Apache HBase by
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
2.9K views54 slides
Improvements to Apache HBase and Its Applications in Alibaba Search by
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search HBaseCon
3.6K views19 slides
Apache HBase - Just the Basics by
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
4.6K views22 slides
Apache HBase at Airbnb by
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb HBaseCon
5.9K views35 slides

Viewers also liked(20)

Rolling Out Apache HBase for Mobile Offerings at Visa by HBaseCon
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon2.6K views
Apache HBase, Accelerated: In-Memory Flush and Compaction by HBaseCon
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon2.5K views
Keynote: The Future of Apache HBase by HBaseCon
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
HBaseCon2.9K views
Improvements to Apache HBase and Its Applications in Alibaba Search by HBaseCon
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon3.6K views
Apache HBase - Just the Basics by HBaseCon
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon4.6K views
Apache HBase at Airbnb by HBaseCon
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
HBaseCon5.9K views
Phoenix for Rubyists - Rubyconf Brazil 2016 by Mike North
Phoenix for Rubyists - Rubyconf Brazil 2016Phoenix for Rubyists - Rubyconf Brazil 2016
Phoenix for Rubyists - Rubyconf Brazil 2016
Mike North622 views
Hortonworks Technical Workshop: HBase and Apache Phoenix by Hortonworks
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks15K views
Breaking the Sound Barrier with Persistent Memory by HBaseCon
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon1.6K views
Apache HBase in the Enterprise Data Hub at Cerner by HBaseCon
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon2.1K views
Argus Production Monitoring at Salesforce by HBaseCon
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon3.2K views
Apache HBase Improvements and Practices at Xiaomi by HBaseCon
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon4.8K views
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra by Natalino Busa
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa64.9K views
Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon803 views
Time-Series Apache HBase by HBaseCon
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon5.6K views
Apache Phoenix: Transforming HBase into a SQL Database by DataWorks Summit
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit16.7K views
HBaseCon 2012 | Real-time Analytics with HBase - Sematext by Cloudera, Inc.
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
Cloudera, Inc.8K views
In Search of Database Nirvana: Challenges of Delivering HTAP by HBaseCon
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
HBaseCon1.6K views
Solving Multi-tenancy and G1GC in Apache HBase by HBaseCon
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase
HBaseCon2.2K views
Apache Kylin’s Performance Boost from Apache HBase by HBaseCon
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
HBaseCon3.5K views

Similar to Apache Phoenix: Use Cases and New Features

eHarmony @ Hbase Conference 2016 by vijay vangapandu. by
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.Vijaykumar Vangapandu
698 views38 slides
HBaseCon2015-final by
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
127 views35 slides
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ... by
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
4.3K views35 slides
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba by
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
163 views26 slides
messaging.pptx by
messaging.pptxmessaging.pptx
messaging.pptxNParakh1
7 views17 slides
Introduction to Impala by
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
2.9K views40 slides

Similar to Apache Phoenix: Use Cases and New Features(20)

HBaseCon2015-final by Maryann Xue
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
Maryann Xue127 views
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ... by HBaseCon
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon4.3K views
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba by Michael Stack
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack163 views
messaging.pptx by NParakh1
messaging.pptxmessaging.pptx
messaging.pptx
NParakh17 views
Introduction to Impala by markgrover
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
markgrover2.9K views
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series by Amazon Web Services
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Amazon Web Services1.7K views
Large-scale Web Apps @ Pinterest by HBaseCon
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
HBaseCon4.1K views
Stream processing on mobile networks by pbelko82
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
pbelko82311 views
SQL and Machine Learning on Hadoop by Mukund Babbar
SQL and Machine Learning on HadoopSQL and Machine Learning on Hadoop
SQL and Machine Learning on Hadoop
Mukund Babbar412 views
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark by Michael Stack
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack742 views
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова... by GeeksLab Odessa
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa351 views
Streaming Solutions for Real time problems by Abhishek Gupta
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
Abhishek Gupta4.8K views
Architectural Evolution Starting from Hadoop by SpagoWorld
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
SpagoWorld929 views
Performance Optimizations in Apache Impala by Cloudera, Inc.
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.10.7K views
Etu Solution Day 2014 Track-D: 掌握Impala和Spark by James Chen
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen691 views
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev... by Christian Tzolov
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Christian Tzolov539 views

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
3.9K views36 slides
hbaseconasia2017: HBase on Beam by
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
1.3K views26 slides
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
1.4K views21 slides
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
936 views42 slides
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
1.1K views21 slides
hbaseconasia2017: Apache HBase at Netease by
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
754 views27 slides

More from HBaseCon(20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by HBaseCon
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon3.9K views
hbaseconasia2017: HBase on Beam by HBaseCon
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon1.3K views
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by HBaseCon
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon1.4K views
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon936 views
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by HBaseCon
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon1.1K views
hbaseconasia2017: Apache HBase at Netease by HBaseCon
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon754 views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
hbaseconasia2017: 基于HBase的企业级大数据平台 by HBaseCon
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon701 views
hbaseconasia2017: HBase at JD.com by HBaseCon
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon828 views
hbaseconasia2017: Large scale data near-line loading method and architecture by HBaseCon
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon598 views
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei by HBaseCon
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon683 views
hbaseconasia2017: HBase Practice At XiaoMi by HBaseCon
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon1.8K views
hbaseconasia2017: hbase-2.0.0 by HBaseCon
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon1.8K views
HBaseCon2017 Democratizing HBase by HBaseCon
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon897 views
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase by HBaseCon
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon608 views
HBaseCon2017 Transactions in HBase by HBaseCon
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon1.8K views
HBaseCon2017 Highly-Available HBase by HBaseCon
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon1.1K views
HBaseCon2017 Apache HBase at Didi by HBaseCon
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon996 views
HBaseCon2017 gohbase: Pure Go HBase Client by HBaseCon
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon1.7K views
HBaseCon2017 Improving HBase availability in a multi tenant environment by HBaseCon
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon1.2K views

Recently uploaded

MS PowerPoint.pptx by
MS PowerPoint.pptxMS PowerPoint.pptx
MS PowerPoint.pptxLitty Sylus
5 views14 slides
Keep by
KeepKeep
KeepGeniusee
77 views10 slides
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptxNeo4j
8 views26 slides
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionMárton Kodok
6 views55 slides
Ports-and-Adapters Architecture for Embedded HMI by
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMIBurkhard Stubert
21 views19 slides
SAP FOR CONTRACT MANUFACTURING.pdf by
SAP FOR CONTRACT MANUFACTURING.pdfSAP FOR CONTRACT MANUFACTURING.pdf
SAP FOR CONTRACT MANUFACTURING.pdfVirendra Rai, PMP
13 views2 slides

Recently uploaded(20)

FIMA 2023 Neo4j & FS - Entity Resolution.pptx by Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j8 views
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok6 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert21 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller40 views
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h... by Deltares
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...
Deltares9 views
Myths and Facts About Hospice Care: Busting Common Misconceptions by Care Coordinations
Myths and Facts About Hospice Care: Busting Common MisconceptionsMyths and Facts About Hospice Care: Busting Common Misconceptions
Myths and Facts About Hospice Care: Busting Common Misconceptions
Headless JS UG Presentation.pptx by Jack Spektor
Headless JS UG Presentation.pptxHeadless JS UG Presentation.pptx
Headless JS UG Presentation.pptx
Jack Spektor8 views
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs by Deltares
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
Deltares10 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ5 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi212 views
Fleet Management Software in India by Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable11 views
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx by animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm15 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta6 views
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...

Apache Phoenix: Use Cases and New Features

  • 1. Use Cases and New Features @ApachePhoenix http://phoenix.apache.org V5
  • 2. Agenda • Phoenix Use Cases – Argus: Time-series data with Phoenix (Tom Valine, Salesforce.com) – Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster (Masayasu Suzuki, Sony) – Phoenix & eHarmony, a perfect match (Vijay Vangapandu, eHarmony) • What’s new in Phoenix – ACID Transactions with Tephra (Poorna Chandra, Cask) – Cost-based Query Optimization with Calcite (Maryann Xue, Intel) • Q & A – PhoenixCon tomorrow 9am-1pm @ Salesforce.com, 1 Market St, SF
  • 3. Argus: Time-series data with Phoenix Tom Valine Salesforce.com
  • 4. OpenTSDB Limitations OpenTSDB is good, but we need more •Tag Cardinality – Total number of tags per metric is limited to 8 – Performance decreases drastically as tag values increase. •UID Exhaustion – Hard limit of 16M UIDs •Ad hoc querying not possible – Join to other data sources – Joins of time series and events – Simplification of Argus’ transform grammar
  • 5. Phoenix-backed Argus TSDB Service • 3 day hackathon • Modeled metric as Phoenix VIEW – Leverage ROW_TIMESTAMP optimization • Tag values inlined in row key – Uses SKIP_SCAN filter optimization – Allows for secondary indexes on particular metric + tags • Metric and tag names managed outside of data as metadata • Eventually leverage Drillix (Phoenix + Drill) – Cross cluster queries – Joins to other data sources
  • 6. Write Performance Using 2 clients to write in parallel. Phoenix is using 10 writer threads per client
  • 7. Read Performance • Metrics with one tag (60 distinct values) – OpenTSDB and Phoenix performance comparable for small aggregations – Phoenix outperforms OpenTSDB as aggregation size increases
  • 8. Disk usage • Phoenix & OTSDB use approximately the same amount of space with FAST_DIFF and Snappy compression
  • 9. Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster Masayasu “Mas” Suzuki Shinji Nagasaka Takanari Tamesue Sony Corporation
  • 10. Who we are, and why we chose HBase/Phoenix • We are DevOps members from Sony’s News Suite team http://socialife.sony.net/ • HBase/Phoenix was chosen because of a. Scalability, b. SQL compatibility, and c. secondary indexing support
  • 12. Performance test apparatus & results • Test apparatus • Test results Specs Number of records 1.2 billion records (1 KB each) Number of indexes 8 orthogonal indexes Servers 3 Zookeepers (Zookeeper 3.4.5, m3.xlarge x 3) 3 HMaster servers (hadoop 2.5.0, hbase 0.98.6, Phoenix 4.3.0, m3.xlarge x 3) 200 RegionServers (hadoop 2.5.0, hbase 0.98.6, Phoenix 4.3.0, r3.xlarge x 199, c4.8xlarge x 1) Clients 100 x c4.xlarge Results Number of queries 51,053 queries/sec Response time (average) 46 ms
  • 13. Five major tips to maximize performance using HBase/Phoenix Ordered by effectiveness (most effective on the very top) – An extra RPC is issued when the client runs a SQL statement that uses a secondary index – Using SQL hint clause can mitigate this – From Ver. 4.7, changing “UPDATE_CACHE_FREQUENCY” may also work (we have yet to test this) – A memory rich node should be selected for use in RegionServers so as to minimize disk access – As an example, running major compaction and index creation simultaneously should be avoided Details will be presented at the PhoenixCon tomorrow (May 25) 2. Use memories aggressively 1. Use SQL hint clause when using a secondary index 4. Scale-out instead of scale-up 3. Manually split Region files if possible but never over split them 5. Avoid running power intensive tasks simultaneously
  • 15. eHarmony and Phoenix a perfect match NEED FOR ● Handling 30+ Million events during Batch Run ● Serving low latency queries on 16+ Billion records 75th% - 800MS 95th% - 2Sec 99th% - 4Sec
  • 16. eHarmony and Phoenix a perfect match LAMBDA FOR THE SAVE • Layered architecture provides fault tolerance • Hbase as batch storage for write throughput with reasonable read latency • Apache Phoenix as query layer to work with complex queries with confidence • Redis as speed layer cache
  • 17. eHarmony and Phoenix a perfect match PERFORMANCE Phoenix/HBase goes live Get Matches API Response Times Phoenix/HBase goes live Save Match API Response Times
  • 18. eHarmony and Phoenix a perfect match • Highly Consistent and fault tolerant • Need for store level filtering and sorting • Apache Phoenix helped us build an abstract high performance query layer on top of Hbase. • Eased the development process. • Reduced boiler plate code, which provides maintainability. • Build complex queries with confidence. • Secondary indexes. • JDBC connection. • Good community support WHY HBASE AND PHOENIX HBASE APACHE PHOENIX
  • 19. eHarmony and Phoenix a perfect match JAVA ORM LIBRARY(PHO) • Apache Phoenix helped us build PHO (Phoenix-HBase ORM) • PHO provides ability to annotate your entity bean and provides interfaces to build DSL like queries. Disjunction disjunction = new Disjunction(); for (int statusFilter : statusFilters) { disjunction.add(Restrictions.eq("status", statusFilter)); } QueryBuilder.builderFor(FeedItemDto.class).select() .add(Restrictions.eq("userId", userId)) .add(Restrictions.gte("spotlightEnd", spotlightEndDate)) .add(disjunction) .setReturnFields(projection) .addOrder(orderings) .setMaxResults(maxResults) .build();
  • 20. eHarmony and Phoenix a perfect match http://eharmony.github.io/ OPEN SOURCE REPOSITORY https://github.com/eHarmony/pho http://www.eharmony.com/about/careers/ *Please Join us for more details at PhoenixCon tomorrow (May 25)
  • 22. Why Transactions? • All or none semantics simplifies life of developer – Ensures every client has a consistent view of data – Protects against concurrent updates – No need to reason about what state data is left in if write fails – Guaranteed consistency between data and index
  • 23. Apache Tephra • Transactions on HBase – Across regions, tables and RPC calls • ACID semantics • Tephra Powers – CDAP (Cask Data Application Platform) – Apache Phoenix (4.7 onwards)
  • 24. Apache Tephra Architecture Zookeeper Tx Manager (standby) HBase Master 1 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active) Master 2
  • 25. Tephra Components • TransactionAware client • Coordinates transaction lifecycle with manager • Communicates directly with HBase for reads and writes • Transaction Manager • Assigns transaction IDs • Maintains state on in-progress, committed and invalid transactions • Transaction Processor coprocessor • Applies server-side filtering for reads • Cleans up data from failed transactions, and no longer visible versions
  • 26. Snapshot Isolation • Multi-version concurrency control – Cell version (timestamp) = transaction ID – Reads exclude other uncommitted transactions (for isolation) • Optimistic Concurrency Control – Avoids cost of locking rows and tables – Good if conflicts are rare: short transaction, disjoint partitioning of work
  • 27. Single client using 10 threads in parallel with 5K batch size No performance penalty for non-transactional tables Performance
  • 28. Concurrent Write Performance 2 write threads per client, 1000 row batch size, 15 columns table
  • 29. Future Work • Partitioned Transaction Manager • Automatic pruning of invalid transaction list • Read-only transactions • Performance optimizations • Conflict detection • Appends to transaction edit log
  • 31. Integration model Calcite Parser & Validator Calcite Query Optimizer Phoenix Query Plan Generator Phoenix Runtime Phoenix Tables over HBase JDBC Client SQL + Phoenix specific grammar Built-in rules + Phoenix specific rules
  • 32. Cost-based query optimizer with Apache Calcite • Base all query optimization decisions on cost – Filter push down; range scan vs. skip scan – Hash aggregate vs. stream aggregate vs. partial stream aggregate – Sort optimized out; sort/limit push through; fwd/rev/unordered scan – Hash join vs. merge join; join ordering – Use of data table vs. index table – All above (any many others) COMBINED • Query optimizations are modeled as pluggable rules
  • 33. Beyond Phoenix 4.8 with Apache Calcite • Get the missing SQL support – WITH, UNNEST, Scalar subquery, etc. • Materialized views – To allow other forms of indices (maybe defined as external), e.g., a filter view, a join view, or an aggregate view. • Interop with other Calcite adaptors – Already used by Drill, Hive, Kylin, Samza, etc. – Supports any JDBC source – Initial version of Drill-Phoenix integration already working
  • 34. Query Example - no cost-based optimizer select empid, e.name, d.deptno, d.name, location from emps e, depts d using deptno order by e.deptno Phoenix Compiler scan ‘depts’ send ‘depts’ over to RS & build hash-cache scan ‘emps’ hash-join ‘depts’ sort joined table on ‘e.deptno’
  • 35. Query Example - with cost-based optimizer (sort optimization combined with join algorithm decision) LogicalSort key: deptno LogicalJoin inner, e.deptno = d.deptno LogicalProject empid, e.name, d.deptno, d.name, location LogicalTableScan emps LogicalTableScan depts PhoenixTableScan depts PhoenixMergeJoin inner, e.deptno = d.deptno PhoenixClientProject empid, e.name, d.deptno, d.name, location Optimizer Optimization rules + Phoenix operator conversion rules PhoenixTableScan emps PhoenixServerProject empid, name, deptno PhoenixServerProject deptno, name, location select empid, e.name, d.deptno, d.name, location from emps e, depts d using deptno order by e.deptno PhoenixServerSort key: deptno empid empid deptno deptno deptno e.deptno; d.deptno; e.deptno; d.deptno;
  • 36. Query Example - with cost-based optimizer (sort optimization combined with join algorithm decision) Phoenix Implementor PhoenixTableScan depts PhoenixMergeJoin inner, e.deptno = d.deptno PhoenixClientProject empid, e.name, d.deptno, d.name, location PhoenixTableScan emps PhoenixServerProject empid, name, deptno PhoenixServerProject deptno, name, location PhoenixServerSort key: deptno empid empid deptno deptno deptno e.deptno; d.deptno; e.deptno; d.deptno; scan ‘emps’ merge-join ‘emps’ and ‘depts’ sort by ‘deptno’ scan ‘depts’
  • 37. Query Example - Comparison Query plan w/o cost-based optimizer Query plan w/ cost-based optimizer scan ‘emps’, ‘depts’ first ‘depts’, then ‘emps’ 2 tables in parallel hash-cache send & build proportional to size of ‘depts’; might cause exception if too large none hash-cache look-up 1 look-up per ‘emps’ row none sorting sort ‘emps’ join ‘depts’ sort ‘emps’ only optimization approach Local, serial optimization processes Cost-based, rule-driven, integrated performance (single node, 2M * 2K rows) 19.46 s 13.92 s
  • 38. Drillix: Interoperability with Drill select deptno, sum(salary) from emps group by deptno Drill Final Aggregation deptno, sum(salary) Phoenix Table Scan emps Phoenix Tables over HBase Drill Shuffle Phoenix Partial Aggregation deptno, sum(salary) Stage 1: Local Partial aggregation Stage 3: Final aggregation Stage 2: Shuffle partial results
  • 39. Thank you! Questions? Join us tomorrow for PhoenixCon Salesforce.com, 1 Market St, SF 9am-1pm (some companies using Phoenix)