By introducing Secondary Indexes to Apache Phoenix tables we increase filtering and sorting queries performance indicators. There are few ways to make Phoenix’s Global Indexes efficient.
2. Setup
For out basic performance and explain plan overview we are going to use 3 tables:
•USER_DATA_150K (150 000 records)
•USER_DATA_15M (15 000 000 records)
•USER_DATA_30M (30 000 000 records)
7. 4 Ways to Improve Performance
• Include all fields into index. Pros: guarantee that searching operation
will go by index table. Cons: MAJOR performance downgrades for
UPSERT and DELETE operations, index’s table size grows and by default it
is wrong architectural decision unless you're perform a search by all table
fields.
• Index hinting. Pros: lightweight approach, haven’t been noticed problems
with table out of sync (critical bug in version 5.0.0 and below:
PHOENIX-4045). Cons: doesn’t guarantee traversal through index table.
• Covered index. Pros: guarantee that searching operation will go by index
table. Cons: index’s table size grows.
• Local index. Pros: index and data table are consistent. Cons: low read
intensity.
8. Important Notes
• Global Index will not be used by Phoenix unless all of the columns referenced in
the query are contained in the index. A Local Index can be an alternative to a
Global Index.
• After including all fields into Secondary Index and introducing Covered Indexes
the index’s table size increased in 2 times comparing to Index Hinting strategy. It
depends mostly on amount and type of fields of your tables. To check the index’s
table size run the following command on Hadoop’s Namenode:
$ hadoop fs -du -h hdfs://{PATH_TO_HBASE}/data/data/{SCHEMA}/{INDEX}
• An index fields ordinal position should be chosen in a way that aligns with the
common query patterns — choose the most frequently queried column or column
which is going to stand always in front of every filter criteria as the first one and
so on.