Introduction to Apache Hive

4,593 views
4,334 views

Published on

Published in: Technology

Introduction to Apache Hive

  1. 1. APACHE HIVE(Apache Hadoop Sub Project)Agenda: Story – Making of Apache Hive What is Apache Hive Physical Layout Hive CLI Hive QL
  2. 2. Can Elephants Fly?Concern: Can hadoop be used more efficiently/fruitfully by developers? © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 3
  3. 3. Thinking…. ?Step 1. Give him Wings Mr. Hadoop energizing himself. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 5
  4. 4. Thinking… ?Step 2. Pray to GravityThanks to gravity, sky never fell down on us ;)But wait 2012 is not yet over. Keep Praying. Mr. Hadoop enjoying his first air ride. “God did not create the universe, gravity did” - Stephen Hawking © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 6
  5. 5. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 7
  6. 6. Upshot of the down-fall Victims Mr. Hadoo p – The Fly ing Elephan tBlame Gravity! The Fall will have a huge impact. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 8
  7. 7. Saving Life… Step1. ShrinkBEFORE - ACME Elephant ShrinkerAFTER - © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 10
  8. 8. Saving Life…Step2. Genetic Engineering & a bit of magic BEFORE AFTER Mr. Hadoop Ms. Hive Injecting Insecto-receptors © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 11
  9. 9. Behind the scenes…?Hive was initially developed by Facebook. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 13
  10. 10.  Hive is a datawarehouse infrastructure built on top of hadoop. Supports analysis of large datasets stored in Hadoop compatible file systems like HDFS, Amazon S3 fs. Provides SQL-like query language called HiveQL. To accelerate queries, it provides indexing. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 14
  11. 11.  Warehouse directory in hdfs  /user/hive/warehouse Tables ~ Subdirectories of warehouse Partitions ~ Subdirectories of corresponding Table directory. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 15
  12. 12.  Hive Queries are implicitly converted to map- reduce code by hive engine. Compiler translates all the queries into a directed acyclic graph of map-reduce jobs. These map-reduce jobs are sent to hadoop for execution. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 16
  13. 13.  /user/hive directory is created automatically as soon as hive session is started first time. /user/hive/warehouse directory shall be accessible by all.  hadoop dfs -chmod –R 1777 /user/hive/warehouse Recommended to activate sticky bit if supported by the hadoop version installed on cluster. /tmp directory shall also be made as a sticky directory.  hadoop dfs –chmod –R 1777 /tmp © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 17
  14. 14.  Hive CLI(Command Line Interface) can be invoked by hive command.  % hive © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 18
  15. 15. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 19
  16. 16.  DML’s ▪ Select DDL’s ▪ SHOW TABLES ▪ CREATE TABLE ▪ ALTER TABLE ▪ DROP TABLE © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 21
  17. 17. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 23
  18. 18.  Normal Tables are created under warehouse directory. (source Data migrates to warehouse) Normal Tables are directly visible through hdfs directory browsing. On Dropping a normal table, the source data and table meta data both are deleted. External Tables read directly from hdfs files. External tables not visible in warehouse directory. On Dropping an external table, only the meta data is deleted but not the source data. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 24
  19. 19. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 25
  20. 20. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 26
  21. 21. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 27
  22. 22.  Hive QL supports Joins on only equality expressions. Complex boolean expressions, inequality conditions are not supported. More than 2 tables can be joined. Number of map-reduce jobs generated for a join depend on the columns being used.  If same col is used for all the tables, then n=1  Otherwise n>1 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 28
  23. 23. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 29
  24. 24.  HiveQL Doesn’t follow SQL-92 standard Lack support  No Materialized views  No Transaction level support  Limited Sub-query support © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 30
  25. 25. Hadoop – Entering into the new world! © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 31
  26. 26. Reach me Tapan AvasthiAssociate Software Developer Intern, Travelocity Global tapan.avasthi@travelocity.com tapan.k.avasthi@gmail.com © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 32

×