HiveServer2 provides a multi-tenant service end-point for executing Hive queries concurrently. It provides support for authentication and authorization, serves as a JDBC endpoint for users to connect and run queries via various tools, maintains sessions and warm containers for faster query processing, provides caching at multiple levels and much more. In other words, it is an integral component of any Hive deployment. HiveServer2 deployments however often face performance and reliability issues leading to catastrophic failures at times. At Qubole, we have augmented HiveServer2 to utilize the capabilities of the cloud to offer an enterprise-ready scalable and stable HiveServer2 (or HS2) service.
The HS2 experience on the cloud at Qubole, which is our primary platform of deployment, has been enhanced to automatically scale based on the customer’s workload; our solution adds and gracefully removes HS2 instances according to the requirement, thus making HS2 service not only self-sufficient at scale but also fault-tolerant. We have implemented Load Balancing for queries based on the resource utilization on HS2 instances to provide a reliable, efficient and cost-effective solution. A health monitoring service, based on past learnings and insights of running HS2 in customer deployments, implemented on top of this scalable HS2 service acts as the foundation for battle-tested, enterprise-ready solution for HS2 instances. In this talk, we will share the details of such an implementation, and the challenges faced in providing an auto-scalable, highly performant and reliable HS2 experience in the cloud.
Topics include:
* Workload-aware autoscaling for HS2 clusters.
* Agent-based adaptive load balancing of Hive queries on multi-tenant HS2 clusters.
* Durability monitoring using failure semantics and automated measures to provide reliability.
* Enterprise level security for HS2 on the cloud.
* Metrics, monitoring and alerting around the HS2 service.