SlideShare a Scribd company logo
1 of 54
Hadoop–Developer
Training
An Introduction to
Hive
Madhur Nawandar
madhur.nawandar@clogeny.com
Cloud
Computing
Enterprise
Applications Big Data
Storage
DevOps
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
What is Hive
A data warehousing infrastructure based on
Hadoop
Provides easy data summarization
Provides ad-hoc querying and analysis of large
volumes of data
Comes with Hive QL, based on SQL
Allows to plug in custom mappers and reducers
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
What Hive is NOT
Not suitable for small datasets due to high latency
Cannot be compared to systems like Oracle
Does not offer real-time queries and row level
updates
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive Architecture
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Data Models
Tables
• Made up of actual data and the associated metadata
• Actual data is stored in any Hadoop Filesystem
• Metadata is always stored in a relational database
• Managed Tables
 Hive moves data into its warehouse
 CREATE TABLE managed_table (dummy STRING);
LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table;
• External Tables
 Hive refers data from existing location
 CREATE EXTERNAL TABLE external_table (dummy STRING)
LOCATION '/user/tom/external_table';
LOAD DATA INPATH '/user/tom/data.txt' INTO TABLE external_table;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Data Models
Partitions
• A way to dividing tables into coarse-grained parts
• Based on the value of partition column
• Supports multiple dimensions
• Defined at table creation time using PARTITION BY
clause
• At the filesystem level, partitions are simply nested
subdirectories of the table directory.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Data Models
• CREATE TABLE logs (ts BIGINT, line STRING)
PARTITIONED BY (dt STRING, country STRING);
• LOAD DATA LOCAL INPATH 'input/hive/partitions/file1'
INTO TABLE logs PARTITION (dt='2001-01-01', country='GB');
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Data Models
Buckets
• Partitions table within range
• Enables more efficient queries
• Make sampling more efficient
 CREATE TABLE bucketed_users (id INT, name STRING)
CLUSTERED BY (id) INTO 4 BUCKETS;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Column Data Types
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Primitive
TYPE DESCRIPTION EXAMPLE
TINYINT 8-bit signed integer 1
SMALLINT 16-bit signed integer 1
INT 32-bit signed integer 1
BIGINT 64-bit signed integer 1
FLOAT 32-bit single precision floating point
number
1.0
DOUBLE 64-bit double precision floating point
number
1.0
BOOLEAN true/false value TRUE
STRING Character string ‘a’,”a”
TIMESTRAMP Timestamp with nanosecond
precision
‘2012-01-02
03:04:05.123456789’
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Column Data Types
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Complex
TYPE DESCRIPTION EXAMPLE
ARRAY An ordered collection of fields. The
fields must all be of same type
array(1, 2)
MAP An unordered collection of key-value
pairs. Keys must be primitives, values
may be any type. For a particular
map, the keys must be the same
type, and the values must be the
same type
map(‘a’, 1,’ b’, 2)
STRUCT A collection of named fields. The
fields may be of different types
struct(‘a’, 1, 1.0)
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Metastore
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A central repository of Hive metadata
Comprises of 2 parts:
• Metastore service
• Backing store for the data
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Metastore deployment modes
1: Embedded Mode
This is the default metastore deployment mode for CDH. In this
mode the metastore uses a Derby database.
Both the database and the metastore service run embedded in
the main HiveServer process. Both are started for you when you
start the HiveServer process..
This mode requires the least amount of effort to configure.
But it can support only one active user at a time and is not
certified for production use.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Metastore deployment modes
2: Local Mode
In this mode the Hive metastore service runs in the same process as the
main HiveServer process, but the metastore database runs in a separate
process, and can be on a separate host.
The embedded metastore service communicates with the metastore
database over JDBC.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Metastore deployment modes
3: Remote Mode
In this mode the Hive metastore service runs in its own JVM process; other processes
communicate with it via the Thrift network API (configured via the hive.metastore.uris
property).
The metastore service communicates with the metastore database over JDBC (configured
via the javax.jdo.option.ConnectionURL property).
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Metastore Properties
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Property Name Type Description
hive.metastore.warehouse.dir URI The directory in HDFS where
managed tables are stored
hive.metastore.local Boolean Flag for embedded metastore or local
metastore
hive.metastore.uris Comma
separated
URIs
List of remote metastore URI’s
javax.jdo.option.ConnectionURL URI The JDBC URL of the metastore
database
javax.jdo.option.ConnectionDriverName String The JDBC driver classname
javax.jdo.option.ConnectionUserName String The JDBC username
javax.jdo.option.ConnectionPassword String The JDBC password
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive Packages
The following packages are needed by Hive:
hive – base package that provides the complete
language and runtime (required)
hive-metastore – provides scripts for running the
metastore as a standalone service (optional)
hive-server – provides scripts for running the
original HiveServer as a standalone service
(optional)
hive-server2 – provides scripts for running the new
HiveServer2 as a standalone service (optional)
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Comparison with Traditional Databases
Schema on Read Verses Schema on Write
• In a traditional database, a table’s schema is enforced at
data load time
• If the data being loaded doesn’t conform to the schema,
then it is rejected
• Hive, on the other hand, doesn’t verify the data when it is
loaded, but rather when a query is issued
Updates, Transactions, and Indexes
• Updates, transactions, and indexes are mainstays of
traditional databases.
• Until recently, these features have not been considered a
part of Hive’s feature set
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Installing Hive
We will install hive with Metastore as a standalone
service
For this install the hive and Metastore packages as:
$ yum –y install hive hive-metastore
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive Configuration
Default configuration in
• /etc/hive/conf/hive-default.xml
Re(Define) properties in
• /etc/hive/conf/hive-site.xml
Use $HIVE_CONF_DIR to specify alternate conf dir
location
You can override Hadoop configuration properties
in Hive’s configuration
• e.g: mapred.reduce.tasks=1
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database
Step 1: Install and start MySQL if you have not
already done so
• $ yum install mysql-server
Step 2: Configure the MySQL Service and
Connector
• $ yum install mysql-connector-java
• $ ln -s /usr/share/java/mysql-connector-java-
5.1.17.jar /usr/lib/hive/lib/mysql-connector-java-
5.1.17.jar
Step 3: To set the MySQL root password:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database cont…
Step 4: To make sure the MySQL server starts at boot
• $ /sbin/chkconfig mysqld on
Step 5. Create the Database and User
• Create the initial database schema using the hive-schema-
0.10.0.mysql.sql file located in
the/usr/lib/hive/scripts/metastore/upgrade/mysql directory.
• Create a user for hive with the hostname of the metastore.
• Grant proper privileges to the user.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database cont…
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database cont…
Step 6: Configure the Metastore Service to
Communicate with the MySQL Database
• This step shows the configuration properties you need
to set in hive-site.xml to configure the metastore
service to communicate with the MySQL database, and
provides sample settings. Though you can use the same
• hive-site.xml on all hosts (client, metastore, HiveServer)
• hive.metastore.uris is the only property that must be
configured on all of them; the others are used only on
the metastore host.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database cont…
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database cont…
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configure Metastore database cont…
Step 7: Create hive user directory in hdfs
• $ sudo –u hdfs hadoop fs –mkdir /user/hive/warehouse
• $ sudo –u hdfs hadoop fs –chmod og+rw /user/hive/warehouse
• $ sudo –u hdfs hadoop fs –chown –R hive /user/hive
Step 8: Set Environment Variables:
• Add the following to .bashrc file
• $ vim ~/.bashrc
• export HADOOP_HOME="/usr/lib/hadoop"
• PATH=$PATH:"/usr/lib/hadoop/bin“
• Run command “bash” on command prompt
• $ bash
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Starting the Metastore
You can run the metastore from the command line:
• $ hive --service metastore
Ensure that the above does not give any error
Use Ctrl-c to stop the metastore process running
from the command line.
To run the metastore as a daemon, the command
is:
• $ service hive-metastore start
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Starting the Hive Console
To start the Hive console:
• $ hive
To confirm that Hive is working, issue the show
tables; command to list the Hive tables; be sure to
use a semi-colon after the command:
• hive> SHOW tables;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Set a Hive or Hadoop conf property:
• hive> set propkey=value;
List all properties and values:
• hive> set –v;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Creating managed table
• $ cat input/hive/tables/data.txt
• $ hive
• hive> CREATE TABLE managed_table (dummy STRING);
• hive> LOAD DATA LOCAL INPATH
‘input/hive/tables/data.txt' INTO table
managed_table;
• hive> select * from managed_table;
• $ hadoop fs -cat
/user/hive/warehouse/managed_table/data.txt
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Creating external table
• Select a location in hdfs to create table
• Ensure it has write access to other users
 $ sudo -u hdfs hadoop fs -mkdir /user/joe/table
 $ sudo -u hdfs hadoop fs -chmod a+w /user/joe/table
• Create external table and load data into it:
 hive> CREATE EXTERNAL TABLE external_table (dummy STRING)
LOCATION '/user/joe/table';
 hive> LOAD DATA LOCAL INPATH 'input/hive/tables/data.txt' INTO
TABLE external_table;
 hive> select * from external_table;
• Check if the table was created in the external directory
 $ sudo -u hdfs hadoop fs -cat /user/joe/table/data.txt
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Create Partitioned table
• hive> CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt
STRING, country STRING);
Load data in table specifying the partitions
• hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file1' INTO
TABLE logs PARTITION (dt='2001-01-01', country='GB');
• hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file2' INTO
TABLE logs PARTITION (dt='2001-01-01', country='US');
• hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file3' INTO
TABLE logs PARTITION (dt='2001-01-02', country='US');
See the table contents
• hive> select * from logs;
List all the partitions
• hive> SHOW PARTITIONS logs;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Create Bucket:
• Create a normal table users and create a bucket named
bucketed_users from it
 hive> set hive.enforce.bucketing=true;
 hive> CREATE TABLE users (id INT, name STRING);
 hive> LOAD DATA LOCAL INPATH 'input/hive/tables/users.txt' INTO table
users;
 hive> CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED
BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;
 hive> INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users;
• Check the contents of table per bucket
 hive> select * from bucketed_users;
 hive> select * from bucketed_users TABLESAMPLE(BUCKET 1 OUT OF 4
ON id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Hive CLI Commands
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Prerequisites
• Create 2 tables sales and things and load data from files
 hive> CREATE TABLE sales (user STRING, id INT)row format
delimited fields terminated by 't' stored as textfile;
 hive> LOAD DATA LOCAL INPATH 'input/hive/joins/sales.txt'
INTO table sales;
 hive> select * from sales;
 hive> CREATE TABLE things (id INT, name STRING)row format
delimited fields terminated by 't' stored as textfile;
 hive> LOAD DATA LOCAL INPATH 'input/hive/joins/things.txt'
INTO table things;
 hive> select * from things;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Inner Join
• hive> SELECT sales.*, things.* FROM sales JOIN things
ON (sales.id = things.id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Left Outer Join
• hive> SELECT sales.*, things.* FROM sales LEFT OUTER
JOIN things ON (sales.id = things.id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Right Outer Join
• hive> SELECT sales.*, things.* FROM sales RIGHT
OUTER JOIN things ON (sales.id = things.id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Full Outer Join
• hive> SELECT sales.*, things.* FROM sales FULL OUTER
JOIN things ON (sales.id = things.id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Semi Joins
• Hive does not support IN sub queries
 SELECT * from things WHERE things.id IN (SELECT id from sales);
• So solution is semi joins
 hive> SELECT * from things LEFT SEMI JOIN ON (sales.id = things.id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Joins
Map Joins
• Used in case when 1 table is very small enough to fit in
memory. No reducers used
 hive> SELECT /*+ MAPJOIN(things) */ sales.*, things.* FROM
sales JOIN things ON (sales.id = things.id);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Other Commands
CREATE TABLE…AS SELECT
• hive> CREATE TABLE target AS SELECT id from things;
Altering Tables
• hive> ALTER TABLE target RENAME TO source;
• hive> ALTER TABLE source ADD COLUMNS (col2
STRING);
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Other Commands
Dropping Tables
• For managed tables both data and metadata is deleted
• For external tables only metadata is deleted
 hive> drop table <table_name>;
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
References
Hadoop: The Definitive Guide, 3rd Edition
• http://shop.oreilly.com/product/0636920021773.do
Hive Community page
• http://hive.apache.org/

More Related Content

What's hot

Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Brendan Tierney
 
Hopsfs 10x HDFS performance
Hopsfs 10x HDFS performanceHopsfs 10x HDFS performance
Hopsfs 10x HDFS performanceJim Dowling
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Indexing in Exadata
Indexing in ExadataIndexing in Exadata
Indexing in ExadataEnkitec
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventGruter
 
WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...
WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...
WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...ThomasElling1
 
HDP Search Overview (APACHE SOLR & HADOOP)
HDP Search Overview (APACHE SOLR & HADOOP)HDP Search Overview (APACHE SOLR & HADOOP)
HDP Search Overview (APACHE SOLR & HADOOP)Chris Casano
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Locking and Concurrency Control
Locking and Concurrency ControlLocking and Concurrency Control
Locking and Concurrency ControlMorgan Tocker
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Drilling Deep Into Exadata Performance
Drilling Deep Into Exadata PerformanceDrilling Deep Into Exadata Performance
Drilling Deep Into Exadata PerformanceEnkitec
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksEdureka!
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
 
Talend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enTalend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enManoj Sharma
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
 
Database Cloud Services Office Hours - 0421 - Migrate AWS to OCI
Database Cloud Services Office Hours - 0421 - Migrate AWS to OCIDatabase Cloud Services Office Hours - 0421 - Migrate AWS to OCI
Database Cloud Services Office Hours - 0421 - Migrate AWS to OCITammy Bednar
 

What's hot (20)

Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017
 
Hopsfs 10x HDFS performance
Hopsfs 10x HDFS performanceHopsfs 10x HDFS performance
Hopsfs 10x HDFS performance
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Indexing in Exadata
Indexing in ExadataIndexing in Exadata
Indexing in Exadata
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
 
WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...
WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...
WWHF 2018 - Using PowerUpSQL and goddi for Active Directory Information Gathe...
 
HDP Search Overview (APACHE SOLR & HADOOP)
HDP Search Overview (APACHE SOLR & HADOOP)HDP Search Overview (APACHE SOLR & HADOOP)
HDP Search Overview (APACHE SOLR & HADOOP)
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Locking and Concurrency Control
Locking and Concurrency ControlLocking and Concurrency Control
Locking and Concurrency Control
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Drilling Deep Into Exadata Performance
Drilling Deep Into Exadata PerformanceDrilling Deep Into Exadata Performance
Drilling Deep Into Exadata Performance
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
ROracle
ROracle ROracle
ROracle
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
Talend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enTalend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_en
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Database Cloud Services Office Hours - 0421 - Migrate AWS to OCI
Database Cloud Services Office Hours - 0421 - Migrate AWS to OCIDatabase Cloud Services Office Hours - 0421 - Migrate AWS to OCI
Database Cloud Services Office Hours - 0421 - Migrate AWS to OCI
 

Similar to Hadoop Hive

Building Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker SwarmBuilding Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker SwarmWei Lin
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdfwwww63
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioBig Data Aplications Meetup
 
WP VERITAS InfoScale Storage and Dockers Intro - v8
WP VERITAS InfoScale Storage and Dockers Intro - v8WP VERITAS InfoScale Storage and Dockers Intro - v8
WP VERITAS InfoScale Storage and Dockers Intro - v8Rajagopal Vaideeswaran
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...garrett honeycutt
 
Magento performance feat. core Hacks
Magento performance feat. core HacksMagento performance feat. core Hacks
Magento performance feat. core HacksDaniel Niedergesäß
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsDataWorks Summit
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comAlluxio, Inc.
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 

Similar to Hadoop Hive (20)

Building Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker SwarmBuilding Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker Swarm
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdf
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
1z0-997-21 (4).pdf
1z0-997-21 (4).pdf1z0-997-21 (4).pdf
1z0-997-21 (4).pdf
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
WP VERITAS InfoScale Storage and Dockers Intro - v8
WP VERITAS InfoScale Storage and Dockers Intro - v8WP VERITAS InfoScale Storage and Dockers Intro - v8
WP VERITAS InfoScale Storage and Dockers Intro - v8
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
1z0-997-21.pdf
1z0-997-21.pdf1z0-997-21.pdf
1z0-997-21.pdf
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
 
AppFabric Velocity
AppFabric VelocityAppFabric Velocity
AppFabric Velocity
 
Magento performance feat. core Hacks
Magento performance feat. core HacksMagento performance feat. core Hacks
Magento performance feat. core Hacks
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloads
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.com
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

Hadoop Hive

  • 1. Hadoop–Developer Training An Introduction to Hive Madhur Nawandar madhur.nawandar@clogeny.com Cloud Computing Enterprise Applications Big Data Storage DevOps
  • 2. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 What is Hive A data warehousing infrastructure based on Hadoop Provides easy data summarization Provides ad-hoc querying and analysis of large volumes of data Comes with Hive QL, based on SQL Allows to plug in custom mappers and reducers
  • 3. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 What Hive is NOT Not suitable for small datasets due to high latency Cannot be compared to systems like Oracle Does not offer real-time queries and row level updates
  • 4. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive Architecture Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
  • 5. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Data Models Tables • Made up of actual data and the associated metadata • Actual data is stored in any Hadoop Filesystem • Metadata is always stored in a relational database • Managed Tables  Hive moves data into its warehouse  CREATE TABLE managed_table (dummy STRING); LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table; • External Tables  Hive refers data from existing location  CREATE EXTERNAL TABLE external_table (dummy STRING) LOCATION '/user/tom/external_table'; LOAD DATA INPATH '/user/tom/data.txt' INTO TABLE external_table; Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
  • 6. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Data Models Partitions • A way to dividing tables into coarse-grained parts • Based on the value of partition column • Supports multiple dimensions • Defined at table creation time using PARTITION BY clause • At the filesystem level, partitions are simply nested subdirectories of the table directory. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
  • 7. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Data Models • CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING); • LOAD DATA LOCAL INPATH 'input/hive/partitions/file1' INTO TABLE logs PARTITION (dt='2001-01-01', country='GB'); Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
  • 8. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Data Models Buckets • Partitions table within range • Enables more efficient queries • Make sampling more efficient  CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) INTO 4 BUCKETS; Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
  • 9. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Column Data Types Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Primitive TYPE DESCRIPTION EXAMPLE TINYINT 8-bit signed integer 1 SMALLINT 16-bit signed integer 1 INT 32-bit signed integer 1 BIGINT 64-bit signed integer 1 FLOAT 32-bit single precision floating point number 1.0 DOUBLE 64-bit double precision floating point number 1.0 BOOLEAN true/false value TRUE STRING Character string ‘a’,”a” TIMESTRAMP Timestamp with nanosecond precision ‘2012-01-02 03:04:05.123456789’
  • 10. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Column Data Types Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Complex TYPE DESCRIPTION EXAMPLE ARRAY An ordered collection of fields. The fields must all be of same type array(1, 2) MAP An unordered collection of key-value pairs. Keys must be primitives, values may be any type. For a particular map, the keys must be the same type, and the values must be the same type map(‘a’, 1,’ b’, 2) STRUCT A collection of named fields. The fields may be of different types struct(‘a’, 1, 1.0)
  • 11. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Metastore Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A central repository of Hive metadata Comprises of 2 parts: • Metastore service • Backing store for the data
  • 12. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Metastore deployment modes 1: Embedded Mode This is the default metastore deployment mode for CDH. In this mode the metastore uses a Derby database. Both the database and the metastore service run embedded in the main HiveServer process. Both are started for you when you start the HiveServer process.. This mode requires the least amount of effort to configure. But it can support only one active user at a time and is not certified for production use.
  • 13. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Metastore deployment modes 2: Local Mode In this mode the Hive metastore service runs in the same process as the main HiveServer process, but the metastore database runs in a separate process, and can be on a separate host. The embedded metastore service communicates with the metastore database over JDBC.
  • 14. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Metastore deployment modes 3: Remote Mode In this mode the Hive metastore service runs in its own JVM process; other processes communicate with it via the Thrift network API (configured via the hive.metastore.uris property). The metastore service communicates with the metastore database over JDBC (configured via the javax.jdo.option.ConnectionURL property).
  • 15. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Metastore Properties Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Property Name Type Description hive.metastore.warehouse.dir URI The directory in HDFS where managed tables are stored hive.metastore.local Boolean Flag for embedded metastore or local metastore hive.metastore.uris Comma separated URIs List of remote metastore URI’s javax.jdo.option.ConnectionURL URI The JDBC URL of the metastore database javax.jdo.option.ConnectionDriverName String The JDBC driver classname javax.jdo.option.ConnectionUserName String The JDBC username javax.jdo.option.ConnectionPassword String The JDBC password
  • 16. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive Packages The following packages are needed by Hive: hive – base package that provides the complete language and runtime (required) hive-metastore – provides scripts for running the metastore as a standalone service (optional) hive-server – provides scripts for running the original HiveServer as a standalone service (optional) hive-server2 – provides scripts for running the new HiveServer2 as a standalone service (optional)
  • 17. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Comparison with Traditional Databases Schema on Read Verses Schema on Write • In a traditional database, a table’s schema is enforced at data load time • If the data being loaded doesn’t conform to the schema, then it is rejected • Hive, on the other hand, doesn’t verify the data when it is loaded, but rather when a query is issued Updates, Transactions, and Indexes • Updates, transactions, and indexes are mainstays of traditional databases. • Until recently, these features have not been considered a part of Hive’s feature set
  • 18. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Installing Hive We will install hive with Metastore as a standalone service For this install the hive and Metastore packages as: $ yum –y install hive hive-metastore
  • 19. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive Configuration Default configuration in • /etc/hive/conf/hive-default.xml Re(Define) properties in • /etc/hive/conf/hive-site.xml Use $HIVE_CONF_DIR to specify alternate conf dir location You can override Hadoop configuration properties in Hive’s configuration • e.g: mapred.reduce.tasks=1
  • 20. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database Step 1: Install and start MySQL if you have not already done so • $ yum install mysql-server Step 2: Configure the MySQL Service and Connector • $ yum install mysql-connector-java • $ ln -s /usr/share/java/mysql-connector-java- 5.1.17.jar /usr/lib/hive/lib/mysql-connector-java- 5.1.17.jar Step 3: To set the MySQL root password:
  • 21. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database
  • 22. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database cont… Step 4: To make sure the MySQL server starts at boot • $ /sbin/chkconfig mysqld on Step 5. Create the Database and User • Create the initial database schema using the hive-schema- 0.10.0.mysql.sql file located in the/usr/lib/hive/scripts/metastore/upgrade/mysql directory. • Create a user for hive with the hostname of the metastore. • Grant proper privileges to the user.
  • 23. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database cont…
  • 24. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database cont… Step 6: Configure the Metastore Service to Communicate with the MySQL Database • This step shows the configuration properties you need to set in hive-site.xml to configure the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the same • hive-site.xml on all hosts (client, metastore, HiveServer) • hive.metastore.uris is the only property that must be configured on all of them; the others are used only on the metastore host.
  • 25. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database cont…
  • 26. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database cont…
  • 27. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configure Metastore database cont… Step 7: Create hive user directory in hdfs • $ sudo –u hdfs hadoop fs –mkdir /user/hive/warehouse • $ sudo –u hdfs hadoop fs –chmod og+rw /user/hive/warehouse • $ sudo –u hdfs hadoop fs –chown –R hive /user/hive Step 8: Set Environment Variables: • Add the following to .bashrc file • $ vim ~/.bashrc • export HADOOP_HOME="/usr/lib/hadoop" • PATH=$PATH:"/usr/lib/hadoop/bin“ • Run command “bash” on command prompt • $ bash
  • 28. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Starting the Metastore You can run the metastore from the command line: • $ hive --service metastore Ensure that the above does not give any error Use Ctrl-c to stop the metastore process running from the command line. To run the metastore as a daemon, the command is: • $ service hive-metastore start
  • 29. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Starting the Hive Console To start the Hive console: • $ hive To confirm that Hive is working, issue the show tables; command to list the Hive tables; be sure to use a semi-colon after the command: • hive> SHOW tables;
  • 30. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands Set a Hive or Hadoop conf property: • hive> set propkey=value; List all properties and values: • hive> set –v;
  • 31. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands Creating managed table • $ cat input/hive/tables/data.txt • $ hive • hive> CREATE TABLE managed_table (dummy STRING); • hive> LOAD DATA LOCAL INPATH ‘input/hive/tables/data.txt' INTO table managed_table; • hive> select * from managed_table; • $ hadoop fs -cat /user/hive/warehouse/managed_table/data.txt
  • 32. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 33. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 34. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands Creating external table • Select a location in hdfs to create table • Ensure it has write access to other users  $ sudo -u hdfs hadoop fs -mkdir /user/joe/table  $ sudo -u hdfs hadoop fs -chmod a+w /user/joe/table • Create external table and load data into it:  hive> CREATE EXTERNAL TABLE external_table (dummy STRING) LOCATION '/user/joe/table';  hive> LOAD DATA LOCAL INPATH 'input/hive/tables/data.txt' INTO TABLE external_table;  hive> select * from external_table; • Check if the table was created in the external directory  $ sudo -u hdfs hadoop fs -cat /user/joe/table/data.txt
  • 35. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 36. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 37. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands Create Partitioned table • hive> CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING); Load data in table specifying the partitions • hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file1' INTO TABLE logs PARTITION (dt='2001-01-01', country='GB'); • hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file2' INTO TABLE logs PARTITION (dt='2001-01-01', country='US'); • hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file3' INTO TABLE logs PARTITION (dt='2001-01-02', country='US'); See the table contents • hive> select * from logs; List all the partitions • hive> SHOW PARTITIONS logs;
  • 38. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 39. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 40. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 41. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands Create Bucket: • Create a normal table users and create a bucket named bucketed_users from it  hive> set hive.enforce.bucketing=true;  hive> CREATE TABLE users (id INT, name STRING);  hive> LOAD DATA LOCAL INPATH 'input/hive/tables/users.txt' INTO table users;  hive> CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;  hive> INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users; • Check the contents of table per bucket  hive> select * from bucketed_users;  hive> select * from bucketed_users TABLESAMPLE(BUCKET 1 OUT OF 4 ON id);
  • 42. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 43. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Hive CLI Commands
  • 44. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Prerequisites • Create 2 tables sales and things and load data from files  hive> CREATE TABLE sales (user STRING, id INT)row format delimited fields terminated by 't' stored as textfile;  hive> LOAD DATA LOCAL INPATH 'input/hive/joins/sales.txt' INTO table sales;  hive> select * from sales;  hive> CREATE TABLE things (id INT, name STRING)row format delimited fields terminated by 't' stored as textfile;  hive> LOAD DATA LOCAL INPATH 'input/hive/joins/things.txt' INTO table things;  hive> select * from things;
  • 45. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins
  • 46. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Inner Join • hive> SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
  • 47. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Left Outer Join • hive> SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
  • 48. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Right Outer Join • hive> SELECT sales.*, things.* FROM sales RIGHT OUTER JOIN things ON (sales.id = things.id);
  • 49. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Full Outer Join • hive> SELECT sales.*, things.* FROM sales FULL OUTER JOIN things ON (sales.id = things.id);
  • 50. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Semi Joins • Hive does not support IN sub queries  SELECT * from things WHERE things.id IN (SELECT id from sales); • So solution is semi joins  hive> SELECT * from things LEFT SEMI JOIN ON (sales.id = things.id);
  • 51. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Joins Map Joins • Used in case when 1 table is very small enough to fit in memory. No reducers used  hive> SELECT /*+ MAPJOIN(things) */ sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
  • 52. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Other Commands CREATE TABLE…AS SELECT • hive> CREATE TABLE target AS SELECT id from things; Altering Tables • hive> ALTER TABLE target RENAME TO source; • hive> ALTER TABLE source ADD COLUMNS (col2 STRING);
  • 53. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Other Commands Dropping Tables • For managed tables both data and metadata is deleted • For external tables only metadata is deleted  hive> drop table <table_name>;
  • 54. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 References Hadoop: The Definitive Guide, 3rd Edition • http://shop.oreilly.com/product/0636920021773.do Hive Community page • http://hive.apache.org/