Tuning Up Apache Phoenix.
Secondary Indexes
By Vlad Krava

2019
Setup
For out basic performance and explain plan overview we are going to use 3 tables:
•USER_DATA_150K (150 000 records)
•USER_DATA_15M (15 000 000 records)
•USER_DATA_30M (30 000 000 records)
Performance
Overview
Search by Primary Key (REC_KEY)
Filtering and Sorting by Table Fields
(REC_LAST_NAME & REC_CREATE_DATE)
Tuning Up
4 Ways to Improve Performance
• Include all fields into index. Pros: guarantee that searching operation
will go by index table. Cons: MAJOR performance downgrades for
UPSERT and DELETE operations, index’s table size grows and by default it
is wrong architectural decision unless you're perform a search by all table
fields.
• Index hinting. Pros: lightweight approach, haven’t been noticed problems
with table out of sync (critical bug in version 5.0.0 and below:
PHOENIX-4045). Cons: doesn’t guarantee traversal through index table.
• Covered index. Pros: guarantee that searching operation will go by index
table. Cons: index’s table size grows.
• Local index. Pros: index and data table are consistent. Cons: low read
intensity.
Important Notes
• Global Index will not be used by Phoenix unless all of the columns referenced in
the query are contained in the index. A Local Index can be an alternative to a
Global Index.
• After including all fields into Secondary Index and introducing Covered Indexes
the index’s table size increased in 2 times comparing to Index Hinting strategy. It
depends mostly on amount and type of fields of your tables. To check the index’s
table size run the following command on Hadoop’s Namenode:
$ hadoop fs -du -h hdfs://{PATH_TO_HBASE}/data/data/{SCHEMA}/{INDEX}
• An index fields ordinal position should be chosen in a way that aligns with the
common query patterns — choose the most frequently queried column or column
which is going to stand always in front of every filter criteria as the first one and
so on.
Q&A
Email: vkrava4@gmail.com

Twitter: vkrava4

Tuning Up Apache Phoenix. Secondary Indexes

  • 1.
    Tuning Up ApachePhoenix. Secondary Indexes By Vlad Krava 2019
  • 2.
    Setup For out basicperformance and explain plan overview we are going to use 3 tables: •USER_DATA_150K (150 000 records) •USER_DATA_15M (15 000 000 records) •USER_DATA_30M (30 000 000 records)
  • 3.
  • 4.
    Search by PrimaryKey (REC_KEY)
  • 5.
    Filtering and Sortingby Table Fields (REC_LAST_NAME & REC_CREATE_DATE)
  • 6.
  • 7.
    4 Ways toImprove Performance • Include all fields into index. Pros: guarantee that searching operation will go by index table. Cons: MAJOR performance downgrades for UPSERT and DELETE operations, index’s table size grows and by default it is wrong architectural decision unless you're perform a search by all table fields. • Index hinting. Pros: lightweight approach, haven’t been noticed problems with table out of sync (critical bug in version 5.0.0 and below: PHOENIX-4045). Cons: doesn’t guarantee traversal through index table. • Covered index. Pros: guarantee that searching operation will go by index table. Cons: index’s table size grows. • Local index. Pros: index and data table are consistent. Cons: low read intensity.
  • 8.
    Important Notes • GlobalIndex will not be used by Phoenix unless all of the columns referenced in the query are contained in the index. A Local Index can be an alternative to a Global Index. • After including all fields into Secondary Index and introducing Covered Indexes the index’s table size increased in 2 times comparing to Index Hinting strategy. It depends mostly on amount and type of fields of your tables. To check the index’s table size run the following command on Hadoop’s Namenode: $ hadoop fs -du -h hdfs://{PATH_TO_HBASE}/data/data/{SCHEMA}/{INDEX} • An index fields ordinal position should be chosen in a way that aligns with the common query patterns — choose the most frequently queried column or column which is going to stand always in front of every filter criteria as the first one and so on.
  • 9.