Project Panthera is an open source effort that showcases better data analytics capabilities on Hadoop/HBase (e.g., better integration with existing infrastructure using SQL, better query processing on HBase, and efficiently utilizing new HW platform technologies). In this talk, we will discusses two new capabilities that we are currently working on under Project Panthera: (1) a SQL Engine for MapReduce (built on top of Hive) that supports common SQL constructs used in analytic queries, including some important features (e.g., sub-query in WHERE clauses, multiple-table SELECT statement, etc.) that are not supported in Hive today; (2) a Document-Oriented Store on HBase for better Hive/SQL query processing, which brings up-to 3x reduction in table storage and up-to 1.8x speedup in query processing.
Presenter: Jason Dai, Principal Engineer, Intel Software and Services Group
Clipping is a handy way to collect important slides you want to go back to later.