June 2012

HiveServer2 Project (WIP)
Carl Steinbach | Platform Engineering
Hive Background: What is it?

    An ETL/Data Warehouse system for Hadoop:

    •  SQL->MR Compiler and Execution Engine

    •  SerDes: Pluggable Data Format Handlers

    •  MetaStore: Persistent Metadata Storage


2
                    ©2012 Cloudera, Inc. All Rights Reserved.
Hive Evolution

    •  Original Vision:
      –  Let users express their queries in a high-level
         language without having to write MR
         programs


    •  Now more and more:
      –  A parallel SQL DBMS that happens to use
         Hadoop for its storage and execution layer.



3
                          ©2012 Cloudera, Inc. All Rights Reserved.
What do users expect from a DBMS?

    •  Sessions/Concurrency
      –  Persistent client state on the server-side
      –  Ability to run multiple client concurrently
    •  ODBC/JDBC
      –  SQL IDEs, BI, ETL, …
    •  Authentication/Authorization
    •  Auditing/Logging


4
                        ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing?

    •  Sessions/Concurrency
      –  Current Thrift API can’t support concurrency
    •  ODBC/JDBC
      –  Thrift API doesn’t support common ODBC/JDBC
    •  Authentication/Authorization
      –  Incomplete implementations
    •  Auditing/Logging
      –  Multiple plugin interfaces in need of consolidation



5
                        ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing

    Concurrency/Sessions

    •  Current Thrift API can’t support multiple
       connections or client sessions.
    •  User/Global Configuration and Session
       Info
    •  Query compiler memory leaks


6
                      ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing

    ODBC/JDBC
    •  Thrift API can’t support common ODBC/
       JDBC calls:
      –  SQLGetInfo
      –  SQLGetTypeInfo
      –  SQLCancel
      –  SQLGetFunctions



7
                     ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing

    Authentication/Authorization
    •  SASL Authentication for HiveServer
    •  Hive supports GRANT/ROLE based
       authorization, but implementation is
       incomplete.
    •  Code injection vectors: ADD JAR,
       TRANSFORM, SET x, …



8
                     ©2012 Cloudera, Inc. All Rights Reserved.
Project Milestones

    •  HiveServer2 Thrift API Spec
    •  JDBC/ODBC HiveServer2 Drivers
    •  Concurrent Thrift clients
      –  Fix query compiler memory leaks
      –  User/Global session/configuration information
    •  Authentication (Kerberos)
    •  Authorization
      –  Extend to configuration, ADD x,
         TRANSFORM, …

9
                       ©2012 Cloudera, Inc. All Rights Reserved.
Who’s working on it?

 •  Carl Steinbach
     –  carl@cloudera
 •  Prasad Mujumdar
     –  prasadm@cloudera




10
                        ©2011 Cloudera, Inc. All Rights Reserved.
Resources

 •  HIVE-2935: Implement HiveServer2

 •  HiveServer API Proposal:
     –  https://cwiki.apache.org/confluence/display/
        Hive/HiveServer2+Thrift+API




11
                      ©2011 Cloudera, Inc. All Rights Reserved.
Questions?

 •    Questions?
 •    Questions?
 •    Questions?
 •    Questions?




12
                   ©2012 Cloudera, Inc. All Rights Reserved.

HiveServer2 for Apache Hive

  • 1.
    June 2012 HiveServer2 Project(WIP) Carl Steinbach | Platform Engineering
  • 2.
    Hive Background: Whatis it? An ETL/Data Warehouse system for Hadoop: •  SQL->MR Compiler and Execution Engine •  SerDes: Pluggable Data Format Handlers •  MetaStore: Persistent Metadata Storage 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3.
    Hive Evolution •  Original Vision: –  Let users express their queries in a high-level language without having to write MR programs •  Now more and more: –  A parallel SQL DBMS that happens to use Hadoop for its storage and execution layer. 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4.
    What do usersexpect from a DBMS? •  Sessions/Concurrency –  Persistent client state on the server-side –  Ability to run multiple client concurrently •  ODBC/JDBC –  SQL IDEs, BI, ETL, … •  Authentication/Authorization •  Auditing/Logging 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5.
    What’s Missing? •  Sessions/Concurrency –  Current Thrift API can’t support concurrency •  ODBC/JDBC –  Thrift API doesn’t support common ODBC/JDBC •  Authentication/Authorization –  Incomplete implementations •  Auditing/Logging –  Multiple plugin interfaces in need of consolidation 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6.
    What’s Missing Concurrency/Sessions •  Current Thrift API can’t support multiple connections or client sessions. •  User/Global Configuration and Session Info •  Query compiler memory leaks 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7.
    What’s Missing ODBC/JDBC •  Thrift API can’t support common ODBC/ JDBC calls: –  SQLGetInfo –  SQLGetTypeInfo –  SQLCancel –  SQLGetFunctions 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8.
    What’s Missing Authentication/Authorization •  SASL Authentication for HiveServer •  Hive supports GRANT/ROLE based authorization, but implementation is incomplete. •  Code injection vectors: ADD JAR, TRANSFORM, SET x, … 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9.
    Project Milestones •  HiveServer2 Thrift API Spec •  JDBC/ODBC HiveServer2 Drivers •  Concurrent Thrift clients –  Fix query compiler memory leaks –  User/Global session/configuration information •  Authentication (Kerberos) •  Authorization –  Extend to configuration, ADD x, TRANSFORM, … 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10.
    Who’s working onit? •  Carl Steinbach –  carl@cloudera •  Prasad Mujumdar –  prasadm@cloudera 10 ©2011 Cloudera, Inc. All Rights Reserved.
  • 11.
    Resources •  HIVE-2935:Implement HiveServer2 •  HiveServer API Proposal: –  https://cwiki.apache.org/confluence/display/ Hive/HiveServer2+Thrift+API 11 ©2011 Cloudera, Inc. All Rights Reserved.
  • 12.
    Questions? •  Questions? •  Questions? •  Questions? •  Questions? 12 ©2012 Cloudera, Inc. All Rights Reserved.