Apache Hadoop Hive
● What is it ?
● Architecture
● Related Projects
● Hive DDL
● Hive DML
● HiveQL Examples
● Business Int...
Hadoop – What is it ?
● A data warehouse for Hadoop
● Open source writen in Java
● Holds meta data in a relational databas...
Hive – Architecture
Where does Hive sit in the Hadoop architecture ?
Hive – Architecture
● Given an existing HDFS and Hadoop cluster
● Then add Hive and the meta data structure
● Use Flume an...
Hive – Related Projects
● Apache Flume – move large data sets to Hadoop
● Apache Sqoop – cmd line, move rdbms data to Hado...
Hive - DDL
● Create table
hive> CREATE TABLE customer (age INT, address STRING);
● Partitions
hive> CREATE TABLE customer ...
Hive - DDL
● Alter table
hive> ALTER TABLE customer ADD COLUMNS ( age INT) ;
● Drop table
hive> DROP TABLE customer;
Hive - DML
● Loading flat files into Hive
hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE
INTO TABLE customer...
HiveQL Examples
● HiveQL, an SQL like language
hive> SELECT a.age FROM customer a WHERE a.sdate ='2008-
08-15';
selects al...
Hive – Business Intelligence
● Use ODBC to connect Hive to your BI layer
● Now you can use BI tools like Business Objects
...
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project...
Upcoming SlideShare
Loading in...5
×

An introduction to Apache Hadoop Hive

1,768
-1

Published on

A short introduction to Apache Hadoop Hive, what is it and what can it do. How could we use it to connect a Hadoop cluster to business intelligence tools. Then create management reports from our Hadoop cluster data.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,768
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
154
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An introduction to Apache Hadoop Hive

  1. 1. Apache Hadoop Hive ● What is it ? ● Architecture ● Related Projects ● Hive DDL ● Hive DML ● HiveQL Examples ● Business Intelligence
  2. 2. Hadoop – What is it ? ● A data warehouse for Hadoop ● Open source writen in Java ● Holds meta data in a relational database ● Allows SQL like queries ● Supports “big data” data sets ● Offers built in and user defined functions ● Has indexing
  3. 3. Hive – Architecture Where does Hive sit in the Hadoop architecture ?
  4. 4. Hive – Architecture ● Given an existing HDFS and Hadoop cluster ● Then add Hive and the meta data structure ● Use Flume and Sqoop to move data ● Use Hive LOAD DATA command to load from flat files ● Use ODBC for connectivity to your BI layer
  5. 5. Hive – Related Projects ● Apache Flume – move large data sets to Hadoop ● Apache Sqoop – cmd line, move rdbms data to Hadoop ● Apache Hbase – Non relational database ● Apache Pig – analyse large data sets ● Apache Oozie – work flow scheduler ● Apache Mahout – machine learning and data mining ● Apache Hue – Hadoop user interface ● Apache Zoo Keeper – configuration / build
  6. 6. Hive - DDL ● Create table hive> CREATE TABLE customer (age INT, address STRING); ● Partitions hive> CREATE TABLE customer (age INT, address STRING) PARTITIONED BY ( sdate STRING) ; ● Show table hive> SHOW TABLES ; ● Describe table hive> DESCRIBE customer;
  7. 7. Hive - DDL ● Alter table hive> ALTER TABLE customer ADD COLUMNS ( age INT) ; ● Drop table hive> DROP TABLE customer;
  8. 8. Hive - DML ● Loading flat files into Hive hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE INTO TABLE customer; ● No verification of incoming data
  9. 9. HiveQL Examples ● HiveQL, an SQL like language hive> SELECT a.age FROM customer a WHERE a.sdate ='2008- 08-15'; selects all data from table for a partition but doesnt store it hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file' SELECT a.* FROM customer a WHERE a.sdate='2008-08-15'; writes all of customer table to an hdfs directory
  10. 10. Hive – Business Intelligence ● Use ODBC to connect Hive to your BI layer ● Now you can use BI tools like Business Objects – Create a universe over the Hive instance – Create reports against the universe – Create add hoc queries against the universe
  11. 11. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×