Confidential, Copyright © Quanticate
Introduction to
Apache Hive
Muralidharan Deenathayalan
Technical Lead
Muralidharan.deenathayalan@quanticate.com
Apache and Apache Hive project logo are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate
Agenda
 Who Am I ?
 What is Apache Hive?
 Apache Hive key features
 Apache Hive architecture
 How Apache Hive works in Apache Hadoop Eco-system?
 Where Apache Hive is useful?
 Where is Apache Hive is not useful
 Who uses of Apache Hive?
 What is HQL?
 HQL Demo
Confidential, Copyright © Quanticate
Who Am I ?
 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint
 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra
 Primary author of Apache Cassandra Cookbook (In writing )
 Csharpcorner MVP
 Frequent blogger
Confidential, Copyright © Quanticate
What is Apache Hive?
 Apache Hive - SQL on top of Hadoop
 A data warehouse infrastructure built on top of Hadoop for providing data summarization,
query, and analysis.
Confidential, Copyright © Quanticate
Apache Hive key features
 Similar to SQL
 SQL has a huge user base
 SQL is easy to code
 Rich data types (structs, lists and maps)
 Supports SQL filters, joins, group-by and Order by clause
 Extensibility – Custom Types, Custom Functions etc
Confidential, Copyright © Quanticate
Apache Hive architecture
Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
Confidential, Copyright © Quanticate
How Apache Hive works in
Apache Hadoop Eco-system
Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
Confidential, Copyright © Quanticate
Where Apache Hive is useful?
It is well suited for batch processing.
 Log processing,
 Text mining,
 Document indexing,
 Customer-facing business intelligence,
 Predictive modeling etc
Confidential, Copyright © Quanticate
Where is Apache Hive is not useful?
Hive is not designed for,
 Online transaction processing
 Real-time queries
Confidential, Copyright © Quanticate
Who uses of Apache Hive?
Apache Hive is used by,
 Bizo - Uses Hive for reporting and ad hoc queries.
 Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.
 CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.
 Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.
 HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web
analytics
 Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal
and user-facing analytics
Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
Confidential, Copyright © Quanticate
What is HQL?
HQL : Hive Query Language
• Does not conform any ANSI standard
• Very close to MySQL dialect, but with some differences
• SQL to HQL cheat Sheet http://hortonworks.com/wp-
content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
• HQL does not support transactions, so don’t compare with RDBMS
Confidential, Copyright © Quanticate
HQL – Create table
Syntax:
CREATE TABLE <table_name> (<column_definitions>)
[ROW FORMAT <row_format>]
[STORED AS <file_format>]
Example:
CREATE TABLE posts (user STRING, post STRING, time BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
Confidential, Copyright © Quanticate
HQL – Create table Demo
Confidential, Copyright © Quanticate
HQL – Describe table
Syntax :
describe <table_name>;
Example:
describe posts;
Confidential, Copyright © Quanticate
HQL – Describe table demo
Confidential, Copyright © Quanticate
HQL – Show all tables
Syntax:
show tables;
show tables [<filter>];
Example:
show tables;
Show tables ‘table*';
Confidential, Copyright © Quanticate
HQL – Show all tables demo
Confidential, Copyright © Quanticate
HQL – Alter table
Syntax:
ALTER TABLE <table_name> RENAME TO <new_table_name>
ALTER TABLE <table_name> change <old_column_name>
<new_column_name> <new_data_type>;
Example:
//Rename table name
Alter table posts rename to myposts;
// Rename column name with data type change
Alter table posts change time time1 string;
Confidential, Copyright © Quanticate
HQL – Alter table demo
Confidential, Copyright © Quanticate
HQL – How to get records into
Apache Hive tables?
There are two ways to load the data into Apache Hive tables
 Using insert statement
Used to load the data from another table using select statement
 Using Load statement
Used to load the data from a file
Confidential, Copyright © Quanticate
HQL – Insert records
Syntax:
Insert into table <tablename>
select_statement1 from <another_table>;
Example:
Insert into table posts
select “user1”, “Demo“, “123” from table1
Confidential, Copyright © Quanticate
HQL – Insert records demo
Confidential, Copyright © Quanticate
HQL – Load data
Syntax:
Load data inpath <filepath> [overwrite] into table <tablename>
Example:
Load data inpath '/user/hue/posts.csv' into table 'posts'
Confidential, Copyright © Quanticate
HQL –Load data
Confidential, Copyright © Quanticate
HQL – Update records
Syntax:
There is no specific syntax for update, but you can insert statement
with overwrite option.
Example:
Insert overwrite table posts
select “user1”, “Demo“, “123” from table1 where id = ‘123’
Confidential, Copyright © Quanticate
HQL – Update records demo
Confidential, Copyright © Quanticate
HQL – Delete records
You can not records from Apache Hive tables!
Confidential, Copyright © Quanticate
HQL – Delete records demo
Confidential, Copyright © Quanticate
HQL – Drop table
Syntax:
drop table <table_name>
Example:
drop table posts;
Confidential, Copyright © Quanticate
HQL – Drop table demo
Confidential, Copyright © Quanticate
Summary
 What is Apache Hive?
 Apache Hive key features
 Apache Hive architecture
 How Apache Hive works in Apache Hadoop Eco-system?
 Where Apache Hive is useful?
 Where is Apache Hive is not useful
 Who uses of Apache Hive?
 Getting started with HQL
Confidential, Copyright © Quanticate
Q & A
Confidential, Copyright © Quanticate
For the next session !!
 Partitioning
 Bucketing
 Union
 Sub queries
 Joins
 Group By
 Order By
 Aggregations
Confidential, Copyright © Quanticate
References
https://hive.apache.org/
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
https://cwiki.apache.org/confluence/display/Hive/Home
https://cwiki.apache.org/confluence/display/Hive/PoweredBy
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Confidential, Copyright © Quanticate
Coding-Freaks.Net
www.codingfreaks.net
Quanticate OPDev Twitter
https://twitter.com/quanticateopdev
Twitter
www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

Apache Hive - Introduction

  • 1.
    Confidential, Copyright ©Quanticate Introduction to Apache Hive Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache and Apache Hive project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2.
    Confidential, Copyright ©Quanticate Agenda  Who Am I ?  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  What is HQL?  HQL Demo
  • 3.
    Confidential, Copyright ©Quanticate Who Am I ?  7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint  2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra  Primary author of Apache Cassandra Cookbook (In writing )  Csharpcorner MVP  Frequent blogger
  • 4.
    Confidential, Copyright ©Quanticate What is Apache Hive?  Apache Hive - SQL on top of Hadoop  A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
  • 5.
    Confidential, Copyright ©Quanticate Apache Hive key features  Similar to SQL  SQL has a huge user base  SQL is easy to code  Rich data types (structs, lists and maps)  Supports SQL filters, joins, group-by and Order by clause  Extensibility – Custom Types, Custom Functions etc
  • 6.
    Confidential, Copyright ©Quanticate Apache Hive architecture Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
  • 7.
    Confidential, Copyright ©Quanticate How Apache Hive works in Apache Hadoop Eco-system Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
  • 8.
    Confidential, Copyright ©Quanticate Where Apache Hive is useful? It is well suited for batch processing.  Log processing,  Text mining,  Document indexing,  Customer-facing business intelligence,  Predictive modeling etc
  • 9.
    Confidential, Copyright ©Quanticate Where is Apache Hive is not useful? Hive is not designed for,  Online transaction processing  Real-time queries
  • 10.
    Confidential, Copyright ©Quanticate Who uses of Apache Hive? Apache Hive is used by,  Bizo - Uses Hive for reporting and ad hoc queries.  Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.  CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.  Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.  HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web analytics  Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal and user-facing analytics Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
  • 11.
    Confidential, Copyright ©Quanticate What is HQL? HQL : Hive Query Language • Does not conform any ANSI standard • Very close to MySQL dialect, but with some differences • SQL to HQL cheat Sheet http://hortonworks.com/wp- content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf • HQL does not support transactions, so don’t compare with RDBMS
  • 12.
    Confidential, Copyright ©Quanticate HQL – Create table Syntax: CREATE TABLE <table_name> (<column_definitions>) [ROW FORMAT <row_format>] [STORED AS <file_format>] Example: CREATE TABLE posts (user STRING, post STRING, time BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
  • 13.
    Confidential, Copyright ©Quanticate HQL – Create table Demo
  • 14.
    Confidential, Copyright ©Quanticate HQL – Describe table Syntax : describe <table_name>; Example: describe posts;
  • 15.
    Confidential, Copyright ©Quanticate HQL – Describe table demo
  • 16.
    Confidential, Copyright ©Quanticate HQL – Show all tables Syntax: show tables; show tables [<filter>]; Example: show tables; Show tables ‘table*';
  • 17.
    Confidential, Copyright ©Quanticate HQL – Show all tables demo
  • 18.
    Confidential, Copyright ©Quanticate HQL – Alter table Syntax: ALTER TABLE <table_name> RENAME TO <new_table_name> ALTER TABLE <table_name> change <old_column_name> <new_column_name> <new_data_type>; Example: //Rename table name Alter table posts rename to myposts; // Rename column name with data type change Alter table posts change time time1 string;
  • 19.
    Confidential, Copyright ©Quanticate HQL – Alter table demo
  • 20.
    Confidential, Copyright ©Quanticate HQL – How to get records into Apache Hive tables? There are two ways to load the data into Apache Hive tables  Using insert statement Used to load the data from another table using select statement  Using Load statement Used to load the data from a file
  • 21.
    Confidential, Copyright ©Quanticate HQL – Insert records Syntax: Insert into table <tablename> select_statement1 from <another_table>; Example: Insert into table posts select “user1”, “Demo“, “123” from table1
  • 22.
    Confidential, Copyright ©Quanticate HQL – Insert records demo
  • 23.
    Confidential, Copyright ©Quanticate HQL – Load data Syntax: Load data inpath <filepath> [overwrite] into table <tablename> Example: Load data inpath '/user/hue/posts.csv' into table 'posts'
  • 24.
    Confidential, Copyright ©Quanticate HQL –Load data
  • 25.
    Confidential, Copyright ©Quanticate HQL – Update records Syntax: There is no specific syntax for update, but you can insert statement with overwrite option. Example: Insert overwrite table posts select “user1”, “Demo“, “123” from table1 where id = ‘123’
  • 26.
    Confidential, Copyright ©Quanticate HQL – Update records demo
  • 27.
    Confidential, Copyright ©Quanticate HQL – Delete records You can not records from Apache Hive tables!
  • 28.
    Confidential, Copyright ©Quanticate HQL – Delete records demo
  • 29.
    Confidential, Copyright ©Quanticate HQL – Drop table Syntax: drop table <table_name> Example: drop table posts;
  • 30.
    Confidential, Copyright ©Quanticate HQL – Drop table demo
  • 31.
    Confidential, Copyright ©Quanticate Summary  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  Getting started with HQL
  • 32.
  • 33.
    Confidential, Copyright ©Quanticate For the next session !!  Partitioning  Bucketing  Union  Sub queries  Joins  Group By  Order By  Aggregations
  • 34.
    Confidential, Copyright ©Quanticate References https://hive.apache.org/ https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/Home https://cwiki.apache.org/confluence/display/Hive/PoweredBy http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
  • 35.
    Confidential, Copyright ©Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand
  • 36.