An introduction to Apache Hadoop Hive


A short introduction to Apache Hadoop Hive, what is it and what can it do. How could we use it to connect a Hadoop cluster to business intelligence tools. Then create management reports from our Hadoop cluster data.

  1. 1. Apache Hadoop Hive ● What is it ? ● Architecture ● Related Projects ● Hive DDL ● Hive DML ● HiveQL Examples ● Business Intelligence
  2. 2. Hadoop – What is it ? ● A data warehouse for Hadoop ● Open source writen in Java ● Holds meta data in a relational database ● Allows SQL like queries ● Supports “big data” data sets ● Offers built in and user defined functions ● Has indexing
  3. 3. Hive – Architecture Where does Hive sit in the Hadoop architecture ?
  4. 4. Hive – Architecture ● Given an existing HDFS and Hadoop cluster ● Then add Hive and the meta data structure ● Use Flume and Sqoop to move data ● Use Hive LOAD DATA command to load from flat files ● Use ODBC for connectivity to your BI layer
  5. 5. Hive – Related Projects ● Apache Flume – move large data sets to Hadoop ● Apache Sqoop – cmd line, move rdbms data to Hadoop ● Apache Hbase – Non relational database ● Apache Pig – analyse large data sets ● Apache Oozie – work flow scheduler ● Apache Mahout – machine learning and data mining ● Apache Hue – Hadoop user interface ● Apache Zoo Keeper – configuration / build
  6. 6. Hive - DDL ● Create table hive> CREATE TABLE customer (age INT, address STRING); ● Partitions hive> CREATE TABLE customer (age INT, address STRING) PARTITIONED BY ( sdate STRING) ; ● Show table hive> SHOW TABLES ; ● Describe table hive> DESCRIBE customer;
  7. 7. Hive - DDL ● Alter table hive> ALTER TABLE customer ADD COLUMNS ( age INT) ; ● Drop table hive> DROP TABLE customer;
  8. 8. Hive - DML ● Loading flat files into Hive hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE INTO TABLE customer; ● No verification of incoming data
  9. 9. HiveQL Examples ● HiveQL, an SQL like language hive> SELECT a.age FROM customer a WHERE a.sdate ='2008- 08-15'; selects all data from table for a partition but doesnt store it hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file' SELECT a.* FROM customer a WHERE a.sdate='2008-08-15'; writes all of customer table to an hdfs directory
  10. 10. Hive – Business Intelligence ● Use ODBC to connect Hive to your BI layer ● Now you can use BI tools like Business Objects – Create a universe over the Hive instance – Create reports against the universe – Create add hoc queries against the universe
