5. Architecture of Hive
• User Interface - Hive is a data warehouse infrastructure software
that can create interaction between user and HDFS.
• The user interfaces that Hive supports are Hive Web UI, Hive
command line, and Hive HD.
• Meta Store -Hive chooses respective database servers to store
the schema or Metadata of tables, databases, columns in a table,
their data types and HDFS mapping.
• HiveQLProcess Engine- HiveQLis similar to SQLfor querying on
schema info on the Megastore.
• It is one of the replacements of traditional approach for
MapReduce program. Instead of writing MapReduce program in
Java, we can write a query for MapReduce job and process it.
6. Execution Engine - The conjunction part of
HiveQL process Engine and MapReduce is Hive
Execution Engine.
Execution engine processes the query and
generates results as same as MapReduce results.
It uses the flavor of MapReduce.
HDFS or HBASE - Hadoop distributed file system
or HBASE are the data storage techniques to
store data into the file system.
8. WorkingofHive
Execute Query- The Hive interface suchas CommandLine or
W eb U I sendsquery D river to execute.
Get Plan- The driver takes the help of query complier that parses
the query to check the syntaxandquery planor the requirem
ent of
query.
Get Metadata- The compilersendsmetadatarequest to Megastore
S endM etadata-Metastore sends m
etadata as a response to the
compiler.
9. S endPlan-T hecompilerchecks therequirement and
resends the planto the driver.Up to here,the parsing
andcompilingof a query iscomplete.
Execute Plan-the driversendsthe execute planto the
executionengine.
Execute Job- Internally,the process of executionjobis
a MapReduce job.The executionenginesendsthe job
to JobTracker, whichis inName node and it assigns
this job to TaskTracker, whichis inData node. Here,
thequery executes MapReduce job.
10. M etadata O ps-M eanwhileinexecution,the
executionenginecanexecutemetadataoperations
withM etastore.
F etchR esult-T he executionengine receives the
results fromD ata nodes.
Send Results- The execution enginesendsthose
resultantvalues to the driver.
S endResults-The driver sends the results to H ive
Interfaces.
11. Hive-D ataT y pes
All thedata types inhiveare classifiedintofour
types
C olum
nT ypes
Literals
N ullValues
C om
plexT ypes
12. Colum
nTypes
• IntegralTypes -
Integertype datacanbespecifiedusing integraldatatypes,
INT. Whenthedatarangeexceedstherangeof INT, youneedto use
B I G I N T andifthedatarangeissm
allerthantheIN T ,youuse
S M A L L INT .T IN Y INT issm
allerthanS M A L L IN T .
• StringTypes -
Stringtypedatatypescanbespecifiedusingsinglequotes(' ') or
doublequotes(" ").It containstwodata types:VA R C H A R andC H A R .
H ivefollowsC -
types escapecharacters.
13. • Timestamp -ItsupportstraditionalUNIXtimestamp withoptional
nanosecond precision. Itsupportsjava.sql.Timestamp format
“YYYY-MM-DDHH:MM:SS.fffffffff” andformat“yyyy-mm- dd
hh:mm:ss.ffffffffff”.
• Dates-DATEvaluesaredescribedinyear/month/dayformat in
theform{{YYYY-MM-DD}}.
• Decimals -The DECIMAL type in Hive is as same as Big
Decimal format of Java. It is used for representing immutable
arbitraryprecision.
• UnionTypes-Unionisacollection ofheterogeneous datatypes.
Youcancreateaninstanceusingcreateunion.
14. L iterals
• FloatingPoint Types -
Floatingpointtypesare nothingbutnumbers
withdecimalpoints.Generally,thistypeofdataiscomposedof
D O U B L E datatype.
• Decim
alT ype-Decim
altypedataisnothingbut floatingpointvaluewith
higherrangethan D O U B L E datatype.The rangeofdecimaltypeis
approxim
ately-10-
3
0
8to103
0
8
.
15. ComplexTypes
Arrays -Arrays inHive are usedthesamewaythey are usedinJava.
S yntax:A R R A Y <
data_type>
M aps-M apsinHive are sim
ilarto J ava M aps.
S yntax:M A P<
prim
itive_type,data_type>
Structs -Structs inHive issimilarto usingcomplexdata withcomment.
Syntax: STRUCT<col_name : data_type [ C O M M E N T
col_com
m
ent,…]>
16. CreateD atabase
hive>C R E A T E D A T A B A S E [IF
N O T E X I S T S ] userdb;
hive>C R E A T E S C H E M A userdb;
hive>S H O W D A T A B A S E S ;
17. D ropD atabase
hive>DROP D A T A B A S E [IF
E X I S T S ] userdb;
hive>D R O P D A T A B A S E [IF
E X I S T S ] userdbC A S C A D E ;
hive>D R O P S C H E M A userdb;
18. CreateTable
• hive>C R E A T E T A B L E IF N O T E XIS T S
• em
ployee(eidint,nam
eS tring,salaryS tring,destination String)
• >
C O M M E N T ‘E m
ployeedetails’
• >
R O W F O R M A T D E L IM IT E D
• >
F IE L D S T E R M IN A T E D B Y ‘t’
• >
L IN E S T E R M IN A T E D B Y ‘n’
• >
S T O R E D A S T E XT FIL E ;
19. Partition
• Hive organizestablesintopartitions.It isawayof dividinga tableintorelatedparts
basedonthevaluesofpartitioned columnssuchasdate,city,anddepartment.Using
partition,itiseasytoqueryaportionofthedata.
• A ddingpartition-S yntax-hive>A LT E R T A B L E em
ployeeA D D
PA R T IT IO N (year=
‘2013’)location ‘/2012/part2012’;
• Droppingpartition-S yntax-hive>
A LT E R T A B L E em
ployeeD R O P
[IF E XIS T S ]PA R T IT IO N (year=‘2013’);
21. H ive Q L -
S electWhere
T heH ive Query Language (H iveQ L ) isa
querylanguagefor Hive to process andanalyze
structureddata ina Metastore.
hive>S E L E C T *F R O M em
ployee
W H E R E salary>
30000;
22. HiveQL -SelectO rder B y
T heO R D E R B Y clause isusedto retrieve
the details based on one columnand sort the
resultset by ascendingordescendingorder.
hive>S E L E C T Id,N am
e,D ept F R O M
em
ployee O R D E R B Y D E P T ;
23. HiveQ L -Select-Group B y
T heG R O U P B Y clause isusedto groupall
the records ina result set usinga particular
collectioncolum
n.It isusedto query a groupof
records.
hive>S E L E C T Dept,count(*) F R O M
em
ployee G R O U P B Y D E P T ;
24. HiveQ L -
S elect-
Joins
•
•
•
•
J O I N isa clausethat isusedfor combiningspecific fields fromtwotables by
usingvalues commonto each one. It is used to combinerecords fromtwo or
moretables inthe database. It ismoreor less similarto S Q L JOI N.
There are different types of joinsgivenas follows:
J O I N
L E F T O U T E R J O IN
R IG H T O U T E R J O IN
F U L L O U T E R J O IN
25. J O I N
J O I N clause is used to combineand retrieve the
recordsfromm
ultiple tables. J O IN issam
e as
O U T E R J O I N inS Q L . A J O I N condition isto
be raised usingthe primarykeys and foreign keys of
the tables.
hive>S E L E C T c.ID, c.N A M E , c.A G E ,
o.A M O U N T F R O M C U S T O M E R S c
J O IN O R D E R S o O N (c.ID =
o . C U S T O M E R _ I D ) ;
26. L eftO uterJoin
T he H iveQ L L E F T O U T E R J O IN returns all the rows
fromthe left table, even if there are no matchesinthe right
table.This means,if theO N clause matches0(zero)records
intherighttable,theJ O I N stillreturnsa rowintheresult,but
withN U L L ineachcolum
nfromthe right table.
hive>S E L E C T c.ID ,c.NA M E ,o.AM O U N T ,
o.D A T E F R O M C U S T O M E R S c L E F T
O U T E R J O IN O R D E R S o O N (c.ID =
o . C U S T O M E R _ I D ) ;
27. RightOuterJ oin
T he H iveQ L RIG H T O U T E R J O IN returns all
therowsfromtherighttable, evenif there are nomatches
inthe left table. If the O N clause matches0 (zero)
records inthe left table, the J O I N still returns a rowin
the result, but withN U L L ineach columnfromthe left
table.
hive>S E L E C T c.ID ,c.N A M E ,o.A M O U N T ,
o.D A T E F R O M C U S T O M E R S c R IG H T
O U T E R J O IN O R D E R S o O N (c.ID =
o . C U S T O M E R _ I D ) ;
28. FullO uterJoin
T he H iveQ L F U L L O U T E R J O IN combinesthe
records of both the left and the right outer tables that
fulfill the J O I N condition. The joined table contains
either all the records fromboth the tables, or fills in
N U L L valuesfor m
issingm
atches oneitherside.
hive>S E L E C T c.ID ,c.N A M E ,o.A M O U N T ,
o.D A T E F R O M C U S T O M E R S c F U L L
O U T E R J O IN O R D E R S o O N (c.ID =
o . C U S T O M E R _ I D ) ;