Apache Hive - Introduction

4,133 views

Published on

Introduction about Apache Hive. Useful for new user(s) to get started with Apache Hive.

Published in: Technology
3 Comments
18 Likes
Statistics
Notes
No Downloads
Views
Total views
4,133
On SlideShare
0
From Embeds
0
Number of Embeds
455
Actions
Shares
0
Downloads
0
Comments
3
Likes
18
Embeds 0
No embeds

No notes for slide

Apache Hive - Introduction

  1. 1. Confidential, Copyright © Quanticate Introduction to Apache Hive Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache and Apache Hive project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  2. 2. Confidential, Copyright © Quanticate Agenda  Who Am I ?  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  What is HQL?  HQL Demo
  3. 3. Confidential, Copyright © Quanticate Who Am I ?  7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint  2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra  Primary author of Apache Cassandra Cookbook (In writing )  Csharpcorner MVP  Frequent blogger
  4. 4. Confidential, Copyright © Quanticate What is Apache Hive?  Apache Hive - SQL on top of Hadoop  A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
  5. 5. Confidential, Copyright © Quanticate Apache Hive key features  Similar to SQL  SQL has a huge user base  SQL is easy to code  Rich data types (structs, lists and maps)  Supports SQL filters, joins, group-by and Order by clause  Extensibility – Custom Types, Custom Functions etc
  6. 6. Confidential, Copyright © Quanticate Apache Hive architecture Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
  7. 7. Confidential, Copyright © Quanticate How Apache Hive works in Apache Hadoop Eco-system Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
  8. 8. Confidential, Copyright © Quanticate Where Apache Hive is useful? It is well suited for batch processing.  Log processing,  Text mining,  Document indexing,  Customer-facing business intelligence,  Predictive modeling etc
  9. 9. Confidential, Copyright © Quanticate Where is Apache Hive is not useful? Hive is not designed for,  Online transaction processing  Real-time queries
  10. 10. Confidential, Copyright © Quanticate Who uses of Apache Hive? Apache Hive is used by,  Bizo - Uses Hive for reporting and ad hoc queries.  Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.  CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.  Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.  HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web analytics  Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal and user-facing analytics Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
  11. 11. Confidential, Copyright © Quanticate What is HQL? HQL : Hive Query Language • Does not conform any ANSI standard • Very close to MySQL dialect, but with some differences • SQL to HQL cheat Sheet http://hortonworks.com/wp- content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf • HQL does not support transactions, so don’t compare with RDBMS
  12. 12. Confidential, Copyright © Quanticate HQL – Create table Syntax: CREATE TABLE <table_name> (<column_definitions>) [ROW FORMAT <row_format>] [STORED AS <file_format>] Example: CREATE TABLE posts (user STRING, post STRING, time BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
  13. 13. Confidential, Copyright © Quanticate HQL – Create table Demo
  14. 14. Confidential, Copyright © Quanticate HQL – Describe table Syntax : describe <table_name>; Example: describe posts;
  15. 15. Confidential, Copyright © Quanticate HQL – Describe table demo
  16. 16. Confidential, Copyright © Quanticate HQL – Show all tables Syntax: show tables; show tables [<filter>]; Example: show tables; Show tables ‘table*';
  17. 17. Confidential, Copyright © Quanticate HQL – Show all tables demo
  18. 18. Confidential, Copyright © Quanticate HQL – Alter table Syntax: ALTER TABLE <table_name> RENAME TO <new_table_name> ALTER TABLE <table_name> change <old_column_name> <new_column_name> <new_data_type>; Example: //Rename table name Alter table posts rename to myposts; // Rename column name with data type change Alter table posts change time time1 string;
  19. 19. Confidential, Copyright © Quanticate HQL – Alter table demo
  20. 20. Confidential, Copyright © Quanticate HQL – How to get records into Apache Hive tables? There are two ways to load the data into Apache Hive tables  Using insert statement Used to load the data from another table using select statement  Using Load statement Used to load the data from a file
  21. 21. Confidential, Copyright © Quanticate HQL – Insert records Syntax: Insert into table <tablename> select_statement1 from <another_table>; Example: Insert into table posts select “user1”, “Demo“, “123” from table1
  22. 22. Confidential, Copyright © Quanticate HQL – Insert records demo
  23. 23. Confidential, Copyright © Quanticate HQL – Load data Syntax: Load data inpath <filepath> [overwrite] into table <tablename> Example: Load data inpath '/user/hue/posts.csv' into table 'posts'
  24. 24. Confidential, Copyright © Quanticate HQL –Load data
  25. 25. Confidential, Copyright © Quanticate HQL – Update records Syntax: There is no specific syntax for update, but you can insert statement with overwrite option. Example: Insert overwrite table posts select “user1”, “Demo“, “123” from table1 where id = ‘123’
  26. 26. Confidential, Copyright © Quanticate HQL – Update records demo
  27. 27. Confidential, Copyright © Quanticate HQL – Delete records You can not records from Apache Hive tables!
  28. 28. Confidential, Copyright © Quanticate HQL – Delete records demo
  29. 29. Confidential, Copyright © Quanticate HQL – Drop table Syntax: drop table <table_name> Example: drop table posts;
  30. 30. Confidential, Copyright © Quanticate HQL – Drop table demo
  31. 31. Confidential, Copyright © Quanticate Summary  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  Getting started with HQL
  32. 32. Confidential, Copyright © Quanticate Q & A
  33. 33. Confidential, Copyright © Quanticate For the next session !!  Partitioning  Bucketing  Union  Sub queries  Joins  Group By  Order By  Aggregations
  34. 34. Confidential, Copyright © Quanticate References https://hive.apache.org/ https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/Home https://cwiki.apache.org/confluence/display/Hive/PoweredBy http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
  35. 35. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand
  36. 36. Confidential, Copyright © Quanticate

×