SlideShare a Scribd company logo
©2012, Cognizant
Data Warehouse and Query Language for Hadoop
August 2013
By Someshwar Kale
| ©2012, Cognizant2
HIVE
 Data Warehousing Solution built on top of Hadoop
 Provides SQL-like query language named HiveQL
– Minimal learning curve for people with SQL expertise
– Data analysts are target audience
 Early Hive development work started at Facebook in 2007
Today, Facebook counts 29% of its employees (and growing!)
as Hive users.
https://www.facebook.com/note.php?note_id=114588058858
 Today Hive is an Apache project under Hadoop
– http://hive.apache.org
| 2012 Cognizant Technology Solutions
Hive Provides
3
• Ability to bring structure to various data Formats
• Simple interface for ad hoc querying,analyzing and
summarizing large amounts of data
• Access to files on various data stores such
as HDFS and HBase
| ©2012, Cognizant4
Hive
 Hive does NOT provide low latency or realtime queries.
 Even querying small amounts of data may take minutes.
 Designed for scalability and ease-of-use rather than low latency
responses
| ©2012, Cognizant5
Hive
 Translates HiveQL statements into a set of MapReduce Jobs
which are then executed on a Hadoop Cluster.
| ©2012, Cognizant6
Hive Metastore
 To support features like schema(s) and data partitioning Hive
keeps its metadata in a Relational Database
 Packaged with Derby, a lightweight embedded SQL DB
 Default Derby based is good for evaluation an testing
 Schema is not shared between users as each user has their own
instance of embedded Derby Stored in metastore_db directory
which resides in the directory that hive was started from
• Can easily switch another SQL installation such as MySQL
| ©2012, Cognizant7
Metastore Deployment Modes : Embedded Mode
 Default metastore deployment mode for CDH.
 Both the database and the metastore service run embedded in
the main HiveServer process
 Both are started for you when you start the HiveServer process.
 Support only one active user at a time and is not certified for
production use.
| ©2012, Cognizant8
Metastore Deployment Modes : Local Mode
 Hive metastore service runs
in the same process as the
main HiveServer process.
 The metastore database runs
in a separate process, and
can be on a separate host.
 The embedded metastore
service communicates with
the metastore database over
JDBC.
| ©2012, Cognizant9
Metastore Deployment Modes : Remote Mode
| ©2012, Cognizant10
Hive Architecture
| ©2012, Cognizant11
Hive Interface Options
Command Line Interface (CLI)
– Will use exclusively in these slides
• Hive Web Interface
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
• Java Database Connectivity (JDBC)
– https://cwiki.apache.org/confluence/display/Hive/HiveClient
BEELINE for Hivesrver2 (new in CDH4)
- http://sqlline.sourceforge.net/#manual
| ©2012, Cognizant12
Data Types
[cts318692@aster4 ~]$ hive
Logging initialized using configuration in
jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.2.1.jar!/hive-
log4j.properties
Hive history
file=/tmp/cts318692/hive_job_log_cts318692_201308071622_200
5272769.txt
hive>
Launch Hive Command Line Interface
(CLI)
Location of the session’s log file
hive> !cat data/user-posts.txt;
user1,Funny Story,1343182026191
user2,Cool Deal,1343182133839
user4,Interesting Post,1343182154633
user5,Yet Another Blog,13431839394
hive>
Can execute local commands
within CLI, place a command
in between ! and ;
| ©2012, Cognizant13
Data Types
Numeric Types
TINYINT
SMALLINT
INT
BIGINT
FLOAT
DOUBLE
DECIMAL (Note: Only available starting with Hive 0.11.0)
Date/Time Types
TIMESTAMP (Note: Only available starting with
Hive 0.8.0)
DATE (Note: Only available starting with Hive 0.12.0)
Misc Types
BOOLEAN
STRING
BINARY (Note: Only available starting with Hive 0.8.0)
| ©2012, Cognizant14
Complex Data Types
| ©2012, Cognizant15
Check physical storage of hive
[cts318692@aster4 ~]$ hive -S -e "set" | grep warehouse
hive.metastore.warehouse.dir=/user/hive/warehouse
hive.warehouse.subdir.inherit.perms=true
This is the location where hive stores
its data.
| ©2012, Cognizant16
Creating DataBase
hive> CREATE DATABASE IF NOT EXISTS som COMMENT 'my
database'
> LOCATION '/user/cts318692/someshwar/hivestore/'
> WITH DBPROPERTIES ('creator'='someshwar
kale','date'='2013-06-08');
OK
Time taken: 0.046 seconds
Used to suppress
warnings
Database name,
Hive opens default database when u open a
new session
You can override ‘/usr/hive/warehouse’
default location for the new directory
Table propertiesPhysical storage for som
database
| ©2012, Cognizant17
Exploring Data
STRUCT<street:STRING,
city:STRING,
state:STRING,
zip:INT>
For complex data types map,
arrays,structures
field
| ©2012, Cognizant18
Creating Table
For complex data types map,
arrays,structures
For map key and value eg. ‘key’
^C ’value’ (003=ctrlC=^C)
Column seperator Definition
| ©2012, Cognizant19
hive> DESCRIBE FORMATTED som.employees;
| ©2012, Cognizant20
Creating External Table
| ©2012, Cognizant21
Create ..like
 If you omit the EXTERNAL keyword and the original table is
external, the new table will also be external.
 If you omit EXTERNAL and the original table is managed,
the new table will also be managed. However, if you include
the EXTERNAL keyword and the original table is managed,
the new table will be external. Even in this scenario, the
LOCATION clause will still be optional.
| ©2012, Cognizant22
Select Clause
| ©2012, Cognizant23
Describe External Table
| ©2012, Cognizant
Dropping DataBase and Table
By default, Hive won’t permit
you to drop a database if it
contains tables. You can either
drop the tables first or append
the CASCADE keyword to the
command, which will cause
the Hive to drop the tables in the
database first.
| ©2012, Cognizant
Partitions
 To increase performance Hive has the capability to partition data
– The values of partitioned column divide a table into
segments
– Entire partitions can be ignored at query time
– Similar to relational databases’ indexes but not as
Granular
 Partitions have to be properly crated by users
– When inserting data must specify a partition
 At query time, whenever appropriate, Hive will automatically filter
out partitions
| ©2012, Cognizant
Creating Partitioned Table
Partition table based on
the value of a country
and state
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Loading data to table
LOAD DATA LOCAL ... copies the local data to the final location in the
distributed filesystem, while LOAD DATA ... (i.e., without LOCAL) moves
the data to the final location.
Necessary if table to which we are loading
the data is partitioned. This is known as
Static partitioning as we are providing the
partition value in the query
Partitions are physically stored under
separate directories
| ©2012, Cognizant
Schema Violations
hive> LOAD DATA LOCAL INPATH
> 'data/user-posts-inconsistentFormat.txt'
> OVERWRITE INTO TABLE posts;
OK
Time taken: 0.612 seconds
hive> select * from posts;
OK
user1 Funny Story 1343182026191
user2 Cool Deal NULL
user4 Interesting Post 1343182154633
user5 Yet Another Blog 13431839394
Time taken: 0.136 seconds
null is set for any value that
violates pre-defined schema
| ©2012, Cognizant
External Partitioned Tables
| ©2012, Cognizant
Cntd…
There is no difference in syntax
• When partitioned column is specified in the
where clause entire directories/partitions could
be ignored
| ©2012, Cognizant
Bucketing
• Break data into a set of buckets based on a hash
function of a "bucket column"
– Capability to execute queries on a sub-set of random data
• Doesn’t automatically enforce bucketing
– User is required to specify the number of buckets by setting hash of
Reducer
hive> mapred.reduce.tasks = 256;
OR
hive> hive.enforce.bucketing = true;
Either manually set the hash
of
reducers to be the number of
buckets or you can use
‘hive.enforce.bucketing’ which
will set it on your behalf.
| ©2012, Cognizant
Create and Use Table with Buckets
| ©2012, Cognizant
ALTER TABLE
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Cntd…
Partition columns
are not deleted
| ©2012, Cognizant
Inserting Data into Tables from Queries
| ©2012, Cognizant
Dynamic Partition Inserts
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Exporting Data
| ©2012, Cognizant
Functions
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Table generating functions
Return 0 to many rows, one row for
each element from
the input array
| ©2012, Cognizant
Table generating functions
Only a single expression in the
SELECT clause is supported with
UDTF's'.
| ©2012, Cognizant
LIMIT clause
| ©2012, Cognizant
CASE … WHEN … THEN Statements
| ©2012, Cognizant
Where and Group by .. having clause
| ©2012, Cognizant
Joins
| ©2012, Cognizant
Outer Join
| ©2012, Cognizant
Points to remember
 Only equality joins are allowed.
 More than 2 tables can be joined in the same query e.g.
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1)
JOIN c ON (c.key = b.key2)
is a valid join.
 A single map/reduce job if for every table the same column is used in
the join clause -
ON (a.key = b.key1) JOIN c ON (c.key = b.key1)
 ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
is converted into two map/reduce jobs because key1 column from b
is used in the first join condition and key2 column from b is used in the
second one.
| ©2012, Cognizant
ORDER BY and SORT BY
 ORDER BY uses single reducer to sort the data, which may take
an unacceptably long time to execute for larger data sets.
 Hive adds an alternative, SORT BY, that orders the data only
within each reducer, thereby performing a local ordering, where
each reducer’s output will be sorted.
| ©2012, Cognizant
Casting
 If a salary value was not a valid string for a floating-
point number? In this case, Hive returns NULL.
| ©2012, Cognizant
UNION ALL and Nested select
 Each subquery of the union query must produce the
same number of columns, and for each column, its
type must match all the column types in the same
position.
| ©2012, Cognizant
View
• similar to writing a
function in a
programming
language.
• Views are virtual.
| ©2012, Cognizant
Lateral view
 Lateral view is used in conjunction with user-defined table
generating functions such as explode().
 A lateral view first applies the UDTF to each row of base table and
then joins resulting output rows to the input rows to form a virtual
table having the supplied table alias.
 Syntax-
1. LATERAL VIEW udtf(expression) tableAlias AS columnAlias
| ©2012, Cognizant
Lateral view Example
| ©2012, Cognizant
UDF
| ©2012, Cognizant
UDF
 Hive actually uses reflection to find methods whose names are
evaluate and matches the arguments used in the HiveQL function
call.
 Hive can work with both the Hadoop Writables and the Java
primitives, but it’s recommended to work with the Writables since
they can be reused.
 Input arguments type and return type must be same.
| ©2012, Cognizant
UDF
| ©2012, Cognizant
UDF vs. GenericUDF
| ©2012, Cognizant
between operator
hive> select name,salary from employees2 where salary between
80000 and 100000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
....
OK
John Doe 100000.0
John Doe 100000.0
Mary Smith 80000.0
Mary Smith 80000.0
Time taken: 14.39 seconds
 Both values (lower and upper) are inclusive.
| ©2012, Cognizant
HiveServer2
 As of CDH4.1, you can deploy HiveServer2, an improved version of
HiveServer that supports a new Thrift API tailored for JDBC and
ODBC clients, Kerberos authentication, and multi-client concurrency.
 There is also a new CLI for HiveServer2 named BeeLine.
 HiveServer2
 Connection URL ===== jdbc:hive2://<host>:<port>
 Driver Class =========== org.apache.hive.jdbc.HiveDriver
 HiveServer1
 Connection URL ===== jdbc:hive://<host>:<port>
 Driver Class ========org.apache.hadoop.hive.jdbc.HiveDriver
| ©2012, Cognizant
BEELINE
$ /usr/lib/hive/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000 username password
org.apache.hive.jdbc.HiveDriver
0: jdbc:hive2://localhost:10000>
| ©2012, Cognizant
Connecting database using properties file
| ©2012, Cognizant
References
Hive
Edward Capriolo (Author), Dean Wampler
(Author), Jason
Rutherglen (Author)
O'Reilly Media; 1 edition (October 3, 2012)
Chapter About Hive
Hadoop in Action
Chuck Lam (Author)
Manning Publications; 1st Edition (December,
2010)
| ©2011, Cognizant68
Thank You

More Related Content

What's hot

Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
Muralidharan Deenathayalan
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
Hive Hadoop
Hive HadoopHive Hadoop
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
Jay
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
John Sichi
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
Uday Vakalapudi
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
markgrover
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
Rohit Agrawal
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Jonathan Seidman
 

What's hot (20)

Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 

Viewers also liked

Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
Carl Steinbach
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
Recruit Technologies
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
nzhang
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Cloudera, Inc.
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
Kathleen Ting
 
Ian West VP Analytics & Information Cognizant
Ian West VP Analytics & Information CognizantIan West VP Analytics & Information Cognizant
Ian West VP Analytics & Information Cognizant
CIO Edge
 
Hive - SerDe and LazySerde
Hive - SerDe and LazySerdeHive - SerDe and LazySerde
Hive - SerDe and LazySerde
Zheng Shao
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive
Alex Silva
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
Jeyamariappan Guru
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
Jeff Hammerbacher
 
2015-11-24-cognizant-digital-factory
2015-11-24-cognizant-digital-factory2015-11-24-cognizant-digital-factory
2015-11-24-cognizant-digital-factory
Sirris
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
Thomas Vanhove
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Namit Jain
 
Hive Object Model
Hive Object ModelHive Object Model
Hive Object Model
Zheng Shao
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
 

Viewers also liked (20)

Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
Ian West VP Analytics & Information Cognizant
Ian West VP Analytics & Information CognizantIan West VP Analytics & Information Cognizant
Ian West VP Analytics & Information Cognizant
 
Hive - SerDe and LazySerde
Hive - SerDe and LazySerdeHive - SerDe and LazySerde
Hive - SerDe and LazySerde
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
2015-11-24-cognizant-digital-factory
2015-11-24-cognizant-digital-factory2015-11-24-cognizant-digital-factory
2015-11-24-cognizant-digital-factory
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Hive Object Model
Hive Object ModelHive Object Model
Hive Object Model
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 

Similar to Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
Dave Stokes
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
Torsten Steinbach
 
CDS Views.pptx
CDS Views.pptxCDS Views.pptx
CDS Views.pptx
Suman817957
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
EDB
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-featuresNavneet Upneja
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13
EDB
 
Build a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OSBuild a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OS
Jane Man
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
Jimmy Angelakos
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
JethroData
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
EDB
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Ltd
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-FeaturesNavneet Upneja
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
Manish Chopra
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 

Similar to Learning Apache HIVE - Data Warehouse and Query Language for Hadoop (20)

Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
 
CDS Views.pptx
CDS Views.pptxCDS Views.pptx
CDS Views.pptx
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-features
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13
 
Build a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OSBuild a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OS
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
 
Hive
HiveHive
Hive
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

  • 1. ©2012, Cognizant Data Warehouse and Query Language for Hadoop August 2013 By Someshwar Kale
  • 2. | ©2012, Cognizant2 HIVE  Data Warehousing Solution built on top of Hadoop  Provides SQL-like query language named HiveQL – Minimal learning curve for people with SQL expertise – Data analysts are target audience  Early Hive development work started at Facebook in 2007 Today, Facebook counts 29% of its employees (and growing!) as Hive users. https://www.facebook.com/note.php?note_id=114588058858  Today Hive is an Apache project under Hadoop – http://hive.apache.org
  • 3. | 2012 Cognizant Technology Solutions Hive Provides 3 • Ability to bring structure to various data Formats • Simple interface for ad hoc querying,analyzing and summarizing large amounts of data • Access to files on various data stores such as HDFS and HBase
  • 4. | ©2012, Cognizant4 Hive  Hive does NOT provide low latency or realtime queries.  Even querying small amounts of data may take minutes.  Designed for scalability and ease-of-use rather than low latency responses
  • 5. | ©2012, Cognizant5 Hive  Translates HiveQL statements into a set of MapReduce Jobs which are then executed on a Hadoop Cluster.
  • 6. | ©2012, Cognizant6 Hive Metastore  To support features like schema(s) and data partitioning Hive keeps its metadata in a Relational Database  Packaged with Derby, a lightweight embedded SQL DB  Default Derby based is good for evaluation an testing  Schema is not shared between users as each user has their own instance of embedded Derby Stored in metastore_db directory which resides in the directory that hive was started from • Can easily switch another SQL installation such as MySQL
  • 7. | ©2012, Cognizant7 Metastore Deployment Modes : Embedded Mode  Default metastore deployment mode for CDH.  Both the database and the metastore service run embedded in the main HiveServer process  Both are started for you when you start the HiveServer process.  Support only one active user at a time and is not certified for production use.
  • 8. | ©2012, Cognizant8 Metastore Deployment Modes : Local Mode  Hive metastore service runs in the same process as the main HiveServer process.  The metastore database runs in a separate process, and can be on a separate host.  The embedded metastore service communicates with the metastore database over JDBC.
  • 9. | ©2012, Cognizant9 Metastore Deployment Modes : Remote Mode
  • 11. | ©2012, Cognizant11 Hive Interface Options Command Line Interface (CLI) – Will use exclusively in these slides • Hive Web Interface https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface • Java Database Connectivity (JDBC) – https://cwiki.apache.org/confluence/display/Hive/HiveClient BEELINE for Hivesrver2 (new in CDH4) - http://sqlline.sourceforge.net/#manual
  • 12. | ©2012, Cognizant12 Data Types [cts318692@aster4 ~]$ hive Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.2.1.jar!/hive- log4j.properties Hive history file=/tmp/cts318692/hive_job_log_cts318692_201308071622_200 5272769.txt hive> Launch Hive Command Line Interface (CLI) Location of the session’s log file hive> !cat data/user-posts.txt; user1,Funny Story,1343182026191 user2,Cool Deal,1343182133839 user4,Interesting Post,1343182154633 user5,Yet Another Blog,13431839394 hive> Can execute local commands within CLI, place a command in between ! and ;
  • 13. | ©2012, Cognizant13 Data Types Numeric Types TINYINT SMALLINT INT BIGINT FLOAT DOUBLE DECIMAL (Note: Only available starting with Hive 0.11.0) Date/Time Types TIMESTAMP (Note: Only available starting with Hive 0.8.0) DATE (Note: Only available starting with Hive 0.12.0) Misc Types BOOLEAN STRING BINARY (Note: Only available starting with Hive 0.8.0)
  • 15. | ©2012, Cognizant15 Check physical storage of hive [cts318692@aster4 ~]$ hive -S -e "set" | grep warehouse hive.metastore.warehouse.dir=/user/hive/warehouse hive.warehouse.subdir.inherit.perms=true This is the location where hive stores its data.
  • 16. | ©2012, Cognizant16 Creating DataBase hive> CREATE DATABASE IF NOT EXISTS som COMMENT 'my database' > LOCATION '/user/cts318692/someshwar/hivestore/' > WITH DBPROPERTIES ('creator'='someshwar kale','date'='2013-06-08'); OK Time taken: 0.046 seconds Used to suppress warnings Database name, Hive opens default database when u open a new session You can override ‘/usr/hive/warehouse’ default location for the new directory Table propertiesPhysical storage for som database
  • 17. | ©2012, Cognizant17 Exploring Data STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> For complex data types map, arrays,structures field
  • 18. | ©2012, Cognizant18 Creating Table For complex data types map, arrays,structures For map key and value eg. ‘key’ ^C ’value’ (003=ctrlC=^C) Column seperator Definition
  • 19. | ©2012, Cognizant19 hive> DESCRIBE FORMATTED som.employees;
  • 21. | ©2012, Cognizant21 Create ..like  If you omit the EXTERNAL keyword and the original table is external, the new table will also be external.  If you omit EXTERNAL and the original table is managed, the new table will also be managed. However, if you include the EXTERNAL keyword and the original table is managed, the new table will be external. Even in this scenario, the LOCATION clause will still be optional.
  • 24. | ©2012, Cognizant Dropping DataBase and Table By default, Hive won’t permit you to drop a database if it contains tables. You can either drop the tables first or append the CASCADE keyword to the command, which will cause the Hive to drop the tables in the database first.
  • 25. | ©2012, Cognizant Partitions  To increase performance Hive has the capability to partition data – The values of partitioned column divide a table into segments – Entire partitions can be ignored at query time – Similar to relational databases’ indexes but not as Granular  Partitions have to be properly crated by users – When inserting data must specify a partition  At query time, whenever appropriate, Hive will automatically filter out partitions
  • 26. | ©2012, Cognizant Creating Partitioned Table Partition table based on the value of a country and state
  • 28. | ©2012, Cognizant Loading data to table LOAD DATA LOCAL ... copies the local data to the final location in the distributed filesystem, while LOAD DATA ... (i.e., without LOCAL) moves the data to the final location. Necessary if table to which we are loading the data is partitioned. This is known as Static partitioning as we are providing the partition value in the query Partitions are physically stored under separate directories
  • 29. | ©2012, Cognizant Schema Violations hive> LOAD DATA LOCAL INPATH > 'data/user-posts-inconsistentFormat.txt' > OVERWRITE INTO TABLE posts; OK Time taken: 0.612 seconds hive> select * from posts; OK user1 Funny Story 1343182026191 user2 Cool Deal NULL user4 Interesting Post 1343182154633 user5 Yet Another Blog 13431839394 Time taken: 0.136 seconds null is set for any value that violates pre-defined schema
  • 30. | ©2012, Cognizant External Partitioned Tables
  • 31. | ©2012, Cognizant Cntd… There is no difference in syntax • When partitioned column is specified in the where clause entire directories/partitions could be ignored
  • 32. | ©2012, Cognizant Bucketing • Break data into a set of buckets based on a hash function of a "bucket column" – Capability to execute queries on a sub-set of random data • Doesn’t automatically enforce bucketing – User is required to specify the number of buckets by setting hash of Reducer hive> mapred.reduce.tasks = 256; OR hive> hive.enforce.bucketing = true; Either manually set the hash of reducers to be the number of buckets or you can use ‘hive.enforce.bucketing’ which will set it on your behalf.
  • 33. | ©2012, Cognizant Create and Use Table with Buckets
  • 37. | ©2012, Cognizant Cntd… Partition columns are not deleted
  • 38. | ©2012, Cognizant Inserting Data into Tables from Queries
  • 39. | ©2012, Cognizant Dynamic Partition Inserts
  • 45. | ©2012, Cognizant Table generating functions Return 0 to many rows, one row for each element from the input array
  • 46. | ©2012, Cognizant Table generating functions Only a single expression in the SELECT clause is supported with UDTF's'.
  • 48. | ©2012, Cognizant CASE … WHEN … THEN Statements
  • 49. | ©2012, Cognizant Where and Group by .. having clause
  • 52. | ©2012, Cognizant Points to remember  Only equality joins are allowed.  More than 2 tables can be joined in the same query e.g. SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2) is a valid join.  A single map/reduce job if for every table the same column is used in the join clause - ON (a.key = b.key1) JOIN c ON (c.key = b.key1)  ON (a.key = b.key1) JOIN c ON (c.key = b.key2) is converted into two map/reduce jobs because key1 column from b is used in the first join condition and key2 column from b is used in the second one.
  • 53. | ©2012, Cognizant ORDER BY and SORT BY  ORDER BY uses single reducer to sort the data, which may take an unacceptably long time to execute for larger data sets.  Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby performing a local ordering, where each reducer’s output will be sorted.
  • 54. | ©2012, Cognizant Casting  If a salary value was not a valid string for a floating- point number? In this case, Hive returns NULL.
  • 55. | ©2012, Cognizant UNION ALL and Nested select  Each subquery of the union query must produce the same number of columns, and for each column, its type must match all the column types in the same position.
  • 56. | ©2012, Cognizant View • similar to writing a function in a programming language. • Views are virtual.
  • 57. | ©2012, Cognizant Lateral view  Lateral view is used in conjunction with user-defined table generating functions such as explode().  A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.  Syntax- 1. LATERAL VIEW udtf(expression) tableAlias AS columnAlias
  • 60. | ©2012, Cognizant UDF  Hive actually uses reflection to find methods whose names are evaluate and matches the arguments used in the HiveQL function call.  Hive can work with both the Hadoop Writables and the Java primitives, but it’s recommended to work with the Writables since they can be reused.  Input arguments type and return type must be same.
  • 62. | ©2012, Cognizant UDF vs. GenericUDF
  • 63. | ©2012, Cognizant between operator hive> select name,salary from employees2 where salary between 80000 and 100000; Total MapReduce jobs = 1 Launching Job 1 out of 1 .... OK John Doe 100000.0 John Doe 100000.0 Mary Smith 80000.0 Mary Smith 80000.0 Time taken: 14.39 seconds  Both values (lower and upper) are inclusive.
  • 64. | ©2012, Cognizant HiveServer2  As of CDH4.1, you can deploy HiveServer2, an improved version of HiveServer that supports a new Thrift API tailored for JDBC and ODBC clients, Kerberos authentication, and multi-client concurrency.  There is also a new CLI for HiveServer2 named BeeLine.  HiveServer2  Connection URL ===== jdbc:hive2://<host>:<port>  Driver Class =========== org.apache.hive.jdbc.HiveDriver  HiveServer1  Connection URL ===== jdbc:hive://<host>:<port>  Driver Class ========org.apache.hadoop.hive.jdbc.HiveDriver
  • 65. | ©2012, Cognizant BEELINE $ /usr/lib/hive/bin/beeline beeline> !connect jdbc:hive2://localhost:10000 username password org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:10000>
  • 66. | ©2012, Cognizant Connecting database using properties file
  • 67. | ©2012, Cognizant References Hive Edward Capriolo (Author), Dean Wampler (Author), Jason Rutherglen (Author) O'Reilly Media; 1 edition (October 3, 2012) Chapter About Hive Hadoop in Action Chuck Lam (Author) Manning Publications; 1st Edition (December, 2010)