Hive Metastore Cache
- 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive Metastore Cache
- 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Motivation
⬢ Metastore becomes a bottleneck
⬢ Throttle happens in Azure DB
⬢ Interactive queries
⬢ Lots of repeated calls
– 108 getTable calls in on query
- 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
How CachedStore Work
Remote Metastore
HS2 Embedded
Metastore
CachedStore
CachedStore
⬢ Serverside cache
⬢ Cache all metadata in memory
⬢ Database
⬢ Table
⬢ StorageDescriptor
⬢ Partition
⬢ Statistics
⬢ Time based refresh
⬢ 10m partitions in 6G memory
⬢ set hive.metastore.rawstore.impl=CachedStore
- 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deal Memory Limit
⬢ Reality
– Extreme use case
– NDV Bitvector: 16k per column per partition
• 2000 partition * 200 columns * 16k=6.4G
⬢ Solution
– Use fewer bits for NDV
• 2000 partition * 200 columns * 1k=400M
– Black list/White list
– Stop loading when memory full
- 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deal with HA
⬢ Strong consistency requirement
– Metastore writes should propagate to all caches before use
– Metastore writes needs propagate to all CachedStore before ACK
CachedStore
Write
MySQL
CachedStore
Notify