An introduction to Apache Hadoop Hive

  • 1,307 views
Uploaded on

A short introduction to Apache Hadoop Hive, what is it and what can it do. How could we use it to connect a Hadoop cluster to business intelligence tools. Then create management reports from our …

A short introduction to Apache Hadoop Hive, what is it and what can it do. How could we use it to connect a Hadoop cluster to business intelligence tools. Then create management reports from our Hadoop cluster data.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,307
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
113
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Hadoop Hive ● What is it ? ● Architecture ● Related Projects ● Hive DDL ● Hive DML ● HiveQL Examples ● Business Intelligence
  • 2. Hadoop – What is it ? ● A data warehouse for Hadoop ● Open source writen in Java ● Holds meta data in a relational database ● Allows SQL like queries ● Supports “big data” data sets ● Offers built in and user defined functions ● Has indexing
  • 3. Hive – Architecture Where does Hive sit in the Hadoop architecture ?
  • 4. Hive – Architecture ● Given an existing HDFS and Hadoop cluster ● Then add Hive and the meta data structure ● Use Flume and Sqoop to move data ● Use Hive LOAD DATA command to load from flat files ● Use ODBC for connectivity to your BI layer
  • 5. Hive – Related Projects ● Apache Flume – move large data sets to Hadoop ● Apache Sqoop – cmd line, move rdbms data to Hadoop ● Apache Hbase – Non relational database ● Apache Pig – analyse large data sets ● Apache Oozie – work flow scheduler ● Apache Mahout – machine learning and data mining ● Apache Hue – Hadoop user interface ● Apache Zoo Keeper – configuration / build
  • 6. Hive - DDL ● Create table hive> CREATE TABLE customer (age INT, address STRING); ● Partitions hive> CREATE TABLE customer (age INT, address STRING) PARTITIONED BY ( sdate STRING) ; ● Show table hive> SHOW TABLES ; ● Describe table hive> DESCRIBE customer;
  • 7. Hive - DDL ● Alter table hive> ALTER TABLE customer ADD COLUMNS ( age INT) ; ● Drop table hive> DROP TABLE customer;
  • 8. Hive - DML ● Loading flat files into Hive hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE INTO TABLE customer; ● No verification of incoming data
  • 9. HiveQL Examples ● HiveQL, an SQL like language hive> SELECT a.age FROM customer a WHERE a.sdate ='2008- 08-15'; selects all data from table for a partition but doesnt store it hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file' SELECT a.* FROM customer a WHERE a.sdate='2008-08-15'; writes all of customer table to an hdfs directory
  • 10. Hive – Business Intelligence ● Use ODBC to connect Hive to your BI layer ● Now you can use BI tools like Business Objects – Create a universe over the Hive instance – Create reports against the universe – Create add hoc queries against the universe
  • 11. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems