Your SlideShare is downloading. ×
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Apache Hive - Introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Hive - Introduction

2,044

Published on

Introduction about Apache Hive. Useful for new user(s) to get started with Apache Hive. …

Introduction about Apache Hive. Useful for new user(s) to get started with Apache Hive.

Published in: Technology
2 Comments
8 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Apache_Hive.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Pretty clear and understandable for beginners like me!!! good going Mr.Murali.... Kudos...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,044
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
2
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Confidential, Copyright © Quanticate Introduction to Apache Hive Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache and Apache Hive project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2. Confidential, Copyright © Quanticate Agenda  Who Am I ?  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  What is HQL?  HQL Demo
  • 3. Confidential, Copyright © Quanticate Who Am I ?  7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint  2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra  Primary author of Apache Cassandra Cookbook (In writing )  Csharpcorner MVP  Frequent blogger
  • 4. Confidential, Copyright © Quanticate What is Apache Hive?  Apache Hive - SQL on top of Hadoop  A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
  • 5. Confidential, Copyright © Quanticate Apache Hive key features  Similar to SQL  SQL has a huge user base  SQL is easy to code  Rich data types (structs, lists and maps)  Supports SQL filters, joins, group-by and Order by clause  Extensibility – Custom Types, Custom Functions etc
  • 6. Confidential, Copyright © Quanticate Apache Hive architecture Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
  • 7. Confidential, Copyright © Quanticate How Apache Hive works in Apache Hadoop Eco-system Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
  • 8. Confidential, Copyright © Quanticate Where Apache Hive is useful? It is well suited for batch processing.  Log processing,  Text mining,  Document indexing,  Customer-facing business intelligence,  Predictive modeling etc
  • 9. Confidential, Copyright © Quanticate Where is Apache Hive is not useful? Hive is not designed for,  Online transaction processing  Real-time queries
  • 10. Confidential, Copyright © Quanticate Who uses of Apache Hive? Apache Hive is used by,  Bizo - Uses Hive for reporting and ad hoc queries.  Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.  CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.  Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.  HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web analytics  Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal and user-facing analytics Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
  • 11. Confidential, Copyright © Quanticate What is HQL? HQL : Hive Query Language • Does not conform any ANSI standard • Very close to MySQL dialect, but with some differences • SQL to HQL cheat Sheet http://hortonworks.com/wp- content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf • HQL does not support transactions, so don’t compare with RDBMS
  • 12. Confidential, Copyright © Quanticate HQL – Create table Syntax: CREATE TABLE <table_name> (<column_definitions>) [ROW FORMAT <row_format>] [STORED AS <file_format>] Example: CREATE TABLE posts (user STRING, post STRING, time BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
  • 13. Confidential, Copyright © Quanticate HQL – Create table Demo
  • 14. Confidential, Copyright © Quanticate HQL – Describe table Syntax : describe <table_name>; Example: describe posts;
  • 15. Confidential, Copyright © Quanticate HQL – Describe table demo
  • 16. Confidential, Copyright © Quanticate HQL – Show all tables Syntax: show tables; show tables [<filter>]; Example: show tables; Show tables ‘table*';
  • 17. Confidential, Copyright © Quanticate HQL – Show all tables demo
  • 18. Confidential, Copyright © Quanticate HQL – Alter table Syntax: ALTER TABLE <table_name> RENAME TO <new_table_name> ALTER TABLE <table_name> change <old_column_name> <new_column_name> <new_data_type>; Example: //Rename table name Alter table posts rename to myposts; // Rename column name with data type change Alter table posts change time time1 string;
  • 19. Confidential, Copyright © Quanticate HQL – Alter table demo
  • 20. Confidential, Copyright © Quanticate HQL – How to get records into Apache Hive tables? There are two ways to load the data into Apache Hive tables  Using insert statement Used to load the data from another table using select statement  Using Load statement Used to load the data from a file
  • 21. Confidential, Copyright © Quanticate HQL – Insert records Syntax: Insert into table <tablename> select_statement1 from <another_table>; Example: Insert into table posts select “user1”, “Demo“, “123” from table1
  • 22. Confidential, Copyright © Quanticate HQL – Insert records demo
  • 23. Confidential, Copyright © Quanticate HQL – Load data Syntax: Load data inpath <filepath> [overwrite] into table <tablename> Example: Load data inpath '/user/hue/posts.csv' into table 'posts'
  • 24. Confidential, Copyright © Quanticate HQL –Load data
  • 25. Confidential, Copyright © Quanticate HQL – Update records Syntax: There is no specific syntax for update, but you can insert statement with overwrite option. Example: Insert overwrite table posts select “user1”, “Demo“, “123” from table1 where id = ‘123’
  • 26. Confidential, Copyright © Quanticate HQL – Update records demo
  • 27. Confidential, Copyright © Quanticate HQL – Delete records You can not records from Apache Hive tables!
  • 28. Confidential, Copyright © Quanticate HQL – Delete records demo
  • 29. Confidential, Copyright © Quanticate HQL – Drop table Syntax: drop table <table_name> Example: drop table posts;
  • 30. Confidential, Copyright © Quanticate HQL – Drop table demo
  • 31. Confidential, Copyright © Quanticate Summary  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  Getting started with HQL
  • 32. Confidential, Copyright © Quanticate Q & A
  • 33. Confidential, Copyright © Quanticate For the next session !!  Partitioning  Bucketing  Union  Sub queries  Joins  Group By  Order By  Aggregations
  • 34. Confidential, Copyright © Quanticate References https://hive.apache.org/ https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/Home https://cwiki.apache.org/confluence/display/Hive/PoweredBy http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
  • 35. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand
  • 36. Confidential, Copyright © Quanticate

×