Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix + Apache HBase
An Enterprise Grade Data Warehouse
Ank...
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About us!!
– Committer and member of Apache Phoenix PMC
– MTS at Hor...
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Opt...
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Warehouse
EDW helps organize and aggregate analytical data from...
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix Offerings and Interoperability:-
ETL Data Warehouse Visualiz...
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Table,a,123
Table,,123
RegionServer
HDFS
HBase client
Phoenix client...
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Data Warehouse
Hardware cost
Softwarecost
Specialized H/...
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Architecture
Run on
commodity
H/...
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Scalability
Linear
scalability
f...
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Reliability
Highly
Available
Re...
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Manageability
Performance
Tunin...
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use cases
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who uses Phoenix !!
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Analytics Use case - (Web Advertising company)
 Functional Require...
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Warehouse Capacity
 Data Size(ETL Input)
– 24TB/day of raw da...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case Architecture
AdServer
Click Tracking
Kafka
Input
Kafka
Inp...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cube
Generation
Cubes are stored in
HBase
A
N
A
L
Y
T
I
C
S
UI
Conv...
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series Use Case- (Apache Ambari)
 Functional requirements
– S...
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AMS architecture
Metric
Monitors
Hosts
Hadoop
Sinks
HBase
Phoenix
M...
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Op...
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Most important criteria for driving overall perform...
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Use salting to alleviate write hot-spotting
CREATE ...
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Use block encoding and/or compression for better pe...
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Set UPDATE_CACHE_FREQUENCY to bigger value to avoid...
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Divide columns into multiple column families if the...
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secondary Indexes
 Global indexes
– Optimized for read heavy use c...
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secondary Indexes
 Use covered indexes to efficiently scan over th...
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Timestamp Column
 Maps HBase native row timestamp to a Phoenix...
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use of Statistics
Region A
Region F
Region L
Region R
Chunk A
Chunk...
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Skip Scan
 Phoenix supports skip scan to jump to matching keys dir...
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Join optimizations
 Hash Join
– Hash join outperforms other types ...
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Optimize Writes
 Upsert values
– Call it multiple times before com...
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hints
 SERIAL SCAN, RANGE SCAN
 SERIAL
 SMALL SCAN
Some importan...
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Additional References
 For some more optimizations you can refer t...
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Op...
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix Query Server
 A standalone service that proxies use...
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Traditional Apache Phoenix RPC Model
Table,a,123
Table,,123
RegionS...
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Model
Table,a,123
Table,,123
RegionServer
HDFS
HBase c...
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Technology
 HTTP Server and wire API definition
 Plu...
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Clients
 Go language database/sql/driver
– https://gi...
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Op...
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We hope to see you all migrating to Phoenix & HBase and expecting m...
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Upcoming SlideShare
Loading in …5
×

Apache Phoenix + Apache HBase

4,221 views

Published on

Apache Phoenix + Apache HBase

Published in: Technology
  • Be the first to comment

Apache Phoenix + Apache HBase

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Phoenix + Apache HBase An Enterprise Grade Data Warehouse Ankit Singhal , Rajeshbabu , Josh Elser June, 30 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About us!! – Committer and member of Apache Phoenix PMC – MTS at Hortonworks. Ankit Singhal – Committer and member of Apache Phoenix PMC – Committer in Apache HBase – MTS at Hortonworks. RajeshBabu – Committer in Apache Phoenix – Committer and Member of Apache Calcite PMC – MTS at Hortonworks. Josh Elser
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query server Q&A
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Warehouse EDW helps organize and aggregate analytical data from various functional domains and serves as a critical repository for organizations’ operations. STAGING Files IOT data Data Warehouse Mart OLTP ETL Visualization or BI
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix Offerings and Interoperability:- ETL Data Warehouse Visualization & BI
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,c,123 Table,b,123 Phx coproc RegionServer RegionServer Application HBase & Phoenix HBase , a distributed NoSQL store Phoenix , provides OLTP and Analytics over HBase
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Data Warehouse Hardware cost Softwarecost Specialized H/WCommodity H/W LicensingcostNoCost SMPMPP Open Source MPP HBase+ Phoenix
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Architecture Run on commodity H/W True MPP O/S and H/W flexibility Support OLTP and ROLAP
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Scalability Linear scalability for storage Linear scalability for memory Open to Third party storage
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Reliability Highly Available Replication for disaster recovery Fully ACID for Data Integrity
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Manageability Performance Tuning Data Modeling & Schema Evolution Data pruning Online expansion Or upgrade Data Backup and recovery
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use cases
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who uses Phoenix !!
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Analytics Use case - (Web Advertising company)  Functional Requirements – Create a single source of truth – Cross dimensional query on 50+ dimension and 80+ metrics – Support fast Top-N queries  Non-functional requirements – Less than 3 second Response time for slice and dice – 250+ concurrent users – 100k+ Analytics queries/day – Highly available – Linear scalability
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Warehouse Capacity  Data Size(ETL Input) – 24TB/day of raw data system wide – 25 Billion of impressions  HBase Input(cube) – 6 Billion rows of aggregated data(100GB/day)  HBase Cluster size – 65 Nodes of HBase – 520 TB of disk – 4.1 TB of memory
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case Architecture AdServer Click Tracking Kafka Input Kafka Input ETL Filter Aggregate In- Memory Store ETL Filter Aggregate Real-time Kafka CAMUS HDFS ETL HDFS Data Uploader D A T A A P I HBase Views A N A L Y T I C S UI Batch Processing Data Ingestion Analytics Apache Kafka
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cube Generation Cubes are stored in HBase A N A L Y T I C S UI Convert slice and dice query to SQL query Data API Analytics Data Warehouse Architecture Bulk Load HDFS ETL Backup and recovery
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series Use Case- (Apache Ambari)  Functional requirements – Store all cluster metrics collected every second(10k to 100k metrics/second) – Optimize storage/access for time series data  Non-functional requirements – Near real time response time – Scalable – Real time ingestion Ambari Metrics System (AMS)
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AMS architecture Metric Monitors Hosts Hadoop Sinks HBase Phoenix Metric Collector Ambari Server
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Most important criteria for driving overall performance of queries on the table  Primary key should be composed from most-used predicate columns in the queries  In most cases, leading part of primary key should help to convert queries into point lookups or range scans in HBase Primary key design
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Use salting to alleviate write hot-spotting CREATE TABLE …( … ) SALT_BUCKETS = N – Number of buckets should be equal to number of RegionServers  Otherwise, try to presplit the table if you know the row key data set CREATE TABLE …( … ) SPLITS(…) Salting vs pre-split
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Use block encoding and/or compression for better performance CREATE TABLE …( … ) DATA_BLOCK_ENCODING= ‘FAST_DIFF’, COMPRESSION=‘SNAPPY’  Use region replication for read high availability CREATE TABLE …( … ) “REGION_REPLICATION” = “2” Table properties
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Set UPDATE_CACHE_FREQUENCY to bigger value to avoid frequently touching server for metadata updates CREATE TABLE …( … ) UPDATE_CACHE_FREQUENCY = 300000 Table properties
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Divide columns into multiple column families if there are rarely accessed columns – HBase reads only the files of column families specified in the query to reduce I/O pk1 pk2 CF1 CF2 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Frequently accessing columns Rarely accessing columns
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Secondary Indexes  Global indexes – Optimized for read heavy use cases CREATE INDEX idx on table(…)  Local Indexes – Optimized for write heavy and space constrained use cases CREATE LOCAL INDEX idx on table(…)  Functional indexes – Allow you to create indexes on arbitrary expressions. CREATE INDEX UPPER_NAME_INDEX ON EMP(UPPER(FIRSTNAME||’ ’|| LASTNAME ))
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Secondary Indexes  Use covered indexes to efficiently scan over the index table instead of primary table. CREATE INDEX idx ON table(…) include(…)  Pass index hint to guide query optimizer to select the right index for query SELECT /*+INDEX(<table> <index>)*/..
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row Timestamp Column  Maps HBase native row timestamp to a Phoenix column  Leverage optimizations provided by HBase like setting the minimum and maximum time range for scans to entirely skip the store files which don’t fall in that time range.  Perfect for time series use cases.  Syntax CREATE TABLE …(CREATED_DATE NOT NULL DATE … CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP… )
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use of Statistics Region A Region F Region L Region R Chunk A Chunk C Chunk F Chunk I Chunk L Chunk O Chunk R Chunk U A F R L A F R L C I O U Client Client
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Skip Scan  Phoenix supports skip scan to jump to matching keys directly when the query has key sets in predicate SELECT * FROM METRIC_RECORD WHERE METRIC_NAME LIKE 'abc%' AND HOSTNAME in ('host1’, 'host2'); CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 RANGES OVER METRIC_RECORD ['abc','host1'] - ['abd','host2'] Region1 Region2 Region3 Region4 Client RS3RS2RS1 Skip scan
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Join optimizations  Hash Join – Hash join outperforms other types of join algorithms when one of the relations is smaller or records matching the predicate should fit into memory  Sort-Merge join – When the relations are very big in size then use the sort-merge join algorithm  NO_STAR_JOIN hint – For multiple inner-join queries, Phoenix applies a star-join optimization by default. Use this hint in the query if the overall size of all right-hand-side tables would exceed the memory size limit.  NO_CHILD_PARENT_OPTIMIZATION hint – Prevents the usage of child-parent-join optimization.
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Optimize Writes  Upsert values – Call it multiple times before commit for batching mutations – Use prepared statement when you run the query multiple times  Upsert select – Configure phoenix.mutate.batchSize based on row size – Set auto-commit to true for writing scan results directly to HBase. – Set auto-commit to true while running upsert selects on the same table so that writes happen at server.
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hints  SERIAL SCAN, RANGE SCAN  SERIAL  SMALL SCAN Some important hints
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional References  For some more optimizations you can refer to these documents – http://phoenix.apache.org/tuning.html – https://hbase.apache.org/book.html#performance
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query Server
  36. 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Phoenix Query Server  A standalone service that proxies user requests to HBase/Phoenix – Optional  Reference client implementation via JDBC – ”Thick” versus “Thin”  First introduced in Apache Phoenix 4.4.0  Built on Apache Calcite’s Avatica – ”A framework for building database drivers”
  37. 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Traditional Apache Phoenix RPC Model Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,c,123 Table,b,123 Phx coproc RegionServer RegionServer Application
  38. 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Model Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,d,123 Table,b,123 Phx coproc RegionServer RegionServer Query Server Application
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Technology  HTTP Server and wire API definition  Pluggable serialization – Google Protocol Buffers  “Thin” JDBC Driver (over HTTP)  Other goodies! – Pluggable metrics system – TCK (technology compatibility kit) – SPNEGO for Kerberos authentication – Horizontally scalable with load balancing
  40. 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Clients  Go language database/sql/driver – https://github.com/Boostport/avatica  .NET driver – https://github.com/Azure/hdinsight-phoenix-sharp – https://www.nuget.org/packages/Microsoft.Phoenix.Client/1.0.0-preview  ODBC – Built by http://www.simba.com/, also available from Hortonworks  Python DB API v2.0 (not “battle tested”) – https://bitbucket.org/lalinsky/python-phoenixdb Client enablement
  41. 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query Server Q&A
  42. 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We hope to see you all migrating to Phoenix & HBase and expecting more questions on the user mailing lists. Get involved in mailing lists:- user@phoenix.apache.org user@hbase.apache.org You can reach us on:- ankit@apache.org rajeshbabu@apache.org elserj@apache.org Phoenix & HBase
  43. 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

×