We've updated our privacy policy. Click here to review the details. Tap here to review the details.
Activate your 30 day free trial to unlock unlimited reading.
Activate your 30 day free trial to continue reading.
Download to read offline
Over the last few years, the Apache Hive community has been working on advancements to enable a full new range of use cases for the project, moving from its batch processing roots towards a SQL interactive query answering platform. Traditionally, one of the most powerful techniques used to accelerate query processing in data warehouses is the precomputation of relevant summaries or materialized views.
This talk presents our work on introducing materialized views and automatic query rewriting based on those materializations in Apache Hive. In particular, materialized views can be stored natively in Hive or in other systems such as Druid using custom storage handlers, and they can seamlessly exploit new exciting Hive features such as LLAP acceleration. Then the optimizer relies in Apache Calcite to automatically produce full and partial rewritings for a large set of query expressions comprising projections, filters, join, and aggregation operations. We shall describe the current coverage of the rewriting algorithm, how Hive controls important aspects of the life cycle of the materialized views such as the freshness of their data, and outline interesting directions for future improvements. We include an experimental evaluation highlighting the benefits that the usage of materialized views can bring to the execution of Hive workloads.
Speaker
Jesus Camacho Rodriguez, Member of Technical Staff, Hortonworks
Over the last few years, the Apache Hive community has been working on advancements to enable a full new range of use cases for the project, moving from its batch processing roots towards a SQL interactive query answering platform. Traditionally, one of the most powerful techniques used to accelerate query processing in data warehouses is the precomputation of relevant summaries or materialized views.
This talk presents our work on introducing materialized views and automatic query rewriting based on those materializations in Apache Hive. In particular, materialized views can be stored natively in Hive or in other systems such as Druid using custom storage handlers, and they can seamlessly exploit new exciting Hive features such as LLAP acceleration. Then the optimizer relies in Apache Calcite to automatically produce full and partial rewritings for a large set of query expressions comprising projections, filters, join, and aggregation operations. We shall describe the current coverage of the rewriting algorithm, how Hive controls important aspects of the life cycle of the materialized views such as the freshness of their data, and outline interesting directions for future improvements. We include an experimental evaluation highlighting the benefits that the usage of materialized views can bring to the execution of Hive workloads.
Speaker
Jesus Camacho Rodriguez, Member of Technical Staff, Hortonworks
You just clipped your first slide!
Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.
Cancel anytime.Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.
Thank you!