April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
- 2,078 views
Apache Hive and its HiveQL interface has become the de facto SQL interface for Hadoop. Apache Hive was originally built for large-scale operational batch processing and it is very effective with ...
Apache Hive and its HiveQL interface has become the de facto SQL interface for Hadoop. Apache Hive was originally built for large-scale operational batch processing and it is very effective with reporting, data mining, and data preparation use cases. These usage patterns remain very important but with widespread adoption of Hadoop, the enterprise requirement for Hadoop to become more real time or interactive has increased in importance as well. Enabling Hive to answer human-time use cases (i.e. queries in the 5-30 second range) such as big data exploration, visualization, and parameterized reports without needing to resort to yet another tool to install, maintain and learn can deliver a lot of value to the large community of users with existing Hive skills and investments. To this end, we have launched the Stinger Initiative, with input and participation from the broader community, to enhance Hive with more SQL and better performance for these human-time use cases. We believe the performance changes we are making today, along with the work being done in Tez will transform Hive into a single tool that Hadoop users can use to do report generation, ad hoc queries, and large batch jobs spanning 10s or 100s of terabytes.
Alan Gates, Co-founder, Hortonworks and Apache Pig PMC and Apache HCatalog PPMC Member, Author of "Programming Pig" from O'Reilly Media
Owen O’ Malley, Co-founder, Hortonworks, First committer added to Apache Hadoop and Founding Chair of the Apache Hadoop PMC
- Total Views
- Views on SlideShare
- Embed Views