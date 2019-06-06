Successfully reported this slideshow.
  1. 1. Why Hive As we all know, Hadoop uses MapReduce to process and analyze big data. Processing big data consumed more time using traditional methods; Hadoop MapReduce was used to process big data faster
  2. 2. As we all know, Hadoop uses MapReduce to process and analyze big data. Processing big data consumed more time using traditional methods; Hadoop MapReduce was used to process big data faster
  3. 3. As we all know, Hadoop uses MapReduce to process and analyze big data. Processing big data consumed more time using traditional methods; Hadoop MapReduce was used to process big data faster Before JUNe Processing Big Data consumed more time
  4. 4. As we all know, Hadoop uses MapReduce to process and analyze big data. Processing big data consumed more time using traditional methods; Hadoop MapReduce was used to process big data faster Before JUNe Processing Big Data consumed more time Processing Big Data was faster using Mapreduce After JUNe
  5. 5. As we all know, Hadoop uses MapReduce to process and analyze big data. Processing big data consumed more time using traditional methods; Hadoop MapReduce was used to process big data faster Before JUNe Processing Big Data consumed more time Processing Big Data was faster using Mapreduce After JUNe MapReduce is primarily implemented using Java codes. Lengthy complex codes were written by programmers to process data
  6. 6. As we all know, Hadoop uses MapReduce to process and analyze big data. Processing big data consumed more time using traditional methods; Hadoop MapReduce was used to process big data faster Before JUNe Processing Big Data consumed more time Processing Big Data was faster using Mapreduce After JUNe This proved to be a disadvantage for users who were non-programmers. To overcome this issue, Hive and Pig were introduced
  7. 7. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands
  8. 8. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Hive
  9. 9. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Hive Facebook found it hard to process and analyze big data as not all the employees were well versed with high-level coding languages Problem
  10. 10. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Hive Facebook found it hard to process and analyze big data as not all the employees were well versed with high-level coding languages They required a language similar to SQL, which was easier to write. Hence, Hive was developed with a vision to include the concepts of tables, columns just like SQL Problem Solution
  11. 11. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Hive Facebook found it hard to process and analyze big data as not all the employees were well versed with high-level coding languages Problem Solution They required a language similar to SQL, which was easier to write. Hence, Hive was developed with a vision to include the concepts of tables, columns just like SQL
  12. 12. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Pig
  13. 13. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Pig Problem Similarly, Yahoo also found it hard to process and analyze big data using MapReduce as not all the employees were well versed with complex Java codes
  14. 14. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Pig Problem Solution Similarly, Yahoo also found it hard to process and analyze big data using MapReduce as not all the employees were well versed with complex Java codes There was a necessity to process data using a language which was easier than Java. Yahoo researchers developed Pig, which was used to process data quickly and easily
  15. 15. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Need for Pig Problem Solution Similarly, Yahoo also found it hard to process and analyze big data using MapReduce as not all the employees were well versed with complex Java codes There was a necessity to process data using a language which was easier than Java. Yahoo researchers developed Pig, which was used to process data quickly and easily
  16. 16. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands What is Hive?
  17. 17. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands What is Hive? Hive is a data warehouse system which is used for analyzing large datasets stored in HDFS. Hive uses a query language called HiveQL which is similar to SQL MapReduce tasksHiveQL
  18. 18. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands What is Hive? Hive is a data warehouse system which is used for analyzing large datasets stored in HDFS. Hive uses a query language called HiveQL which is similar to SQL MapReduce tasksHiveQL What is Pig?
  19. 19. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands What is Hive? Hive is a data warehouse system which is used for analyzing large datasets stored in HDFS. Hive uses a query language called HiveQL which is similar to SQL MapReduce tasksHiveQL What is Pig? Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets Uses SQL like queries Analyze data
  20. 20. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands HiveQL  Hive Query Language (HiveQL) is a query language used by Hive to process and analyze data  Declarative language which is exactly similar to SQL  HiveQL works on structured data
  21. 21. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands HiveQL Pig Latin  Hive Query Language (HiveQL) is a query language used by Hive to process and analyze data  Declarative language which is exactly similar to SQL  HiveQL works on structured data  Pig Latin is the procedural data flow language used in Pig to analyze data  Pig Latin is similar to SQL but varies greatly  It is used for structured, semi-structured and unstructured data. 10 lines of Pig Latin code = 200 lines in Java
  22. 22. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model
  23. 23. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Tables Partitions Buckets Tables in Hive are similar to those in RDBMS
  24. 24. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Tables Partitions Buckets Tables are grouped into partitions to group the same kind of data based on the partition key
  25. 25. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Tables Partitions Buckets Partitions are further divided into buckets for better querying
  26. 26. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Pig Latin Data Model Tables Partitions Buckets Partitions are further divided into buckets for better querying
  27. 27. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Pig Latin Data Model Tables Partitions Buckets Partitions are further divided into buckets for better querying ‘Ted’ or 50 Atom Tuple (Ted,50) Bag {(Ted,5),( Mike,10} Map [name#Mi ke, age#30] Atom is a single value of primitive data type like int, float, string. It is always stored as string
  28. 28. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Pig Latin Data Model Tables Partitions Buckets Partitions are further divided into buckets for better querying ‘Ted’ or 50 Atom Tuple (Ted,50) Bag {(Ted,5),( Mike,10} Map [name#Mi ke, age#30] Tuple represents sequence of fields that can be of any data type. It is same as a row in RDBMS
  29. 29. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Pig Latin Data Model Tables Partitions Buckets Partitions are further divided into buckets for better querying ‘Ted’ or 50 Atom Tuple (Ted,50) Bag {(Ted,5),( Mike,10} Map [name#Mi ke, age#30] Bag is a collection of tuples. It is the same as a table in RDBMS. It is represented by ‘{}’
  30. 30. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Data Model Pig Latin Data Model Tables Partitions Buckets Partitions are further divided into buckets for better querying ‘Ted’ or 50 Atom Tuple (Ted,50) Bag {(Ted,5),( Mike,10} Map [name#Mi ke, age#30] Map is a set of key-value pairs. Key is of chararray type and value can be of any type. It is represented by ‘[]’
  31. 31. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes
  32. 32. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes Hive operates in two modes depending on the number and size of data nodes
  33. 33. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes Hive operates in two modes depending on the number and size of data nodes It is used when the data is small and when one datanode is present Local Mode MapReduce Mode
  34. 34. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes Hive operates in two modes depending on the number and size of data nodes Local Mode MapReduce Mode It is used when there are multiple datanodes and the data is large
  35. 35. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes Hive operates in two modes depending on the number and size of data nodes Local Mode MapReduce Mode It is used when there are multiple datanodes and the data is large Pig Execution Modes Depending on where the data is residing and where the Pig script is going to run, Pig works in two modes
  36. 36. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes Hive operates in two modes depending on the number and size of data nodes Local Mode MapReduce Mode It is used when there are multiple datanodes and the data is large Pig Execution Modes Depending on where the data is residing and where the Pig script is going to run, Pig works in two modes Local Mode In this mode, Pig engine takes input from the Linux file system and the output is stored in the same file system MapReduce Mode
  37. 37. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands Hive Execution modes Hive operates in two modes depending on the number and size of data nodes Local Mode MapReduce Mode It is used when there are multiple datanodes and the data is large Pig Execution Modes Depending on where the data is residing and where the Pig script is going to run, Pig works in two modes Local Mode MapReduce Mode In this mode, queries written in Pig Latin are translated into MapReduce jobs and are run on a Hadoop cluster. Pig runs on this mode by default
  38. 38. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands
  39. 39. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  Used by programmers and researchers
  40. 40. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  HiveQL is the language used  Used by programmers and researchers  Pig Latin is the language used HiveQL Pig Latin
  41. 41. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  HiveQL is the language used  Works on structured data. Does not work on other types of data  Used by programmers and researchers  Pig Latin is the language used  Works on structured, semi-structured and unstructured data
  42. 42. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  HiveQL is the language used  Works on structured data. Does not work on other types of data  Works on the server side of the cluster  Used by programmers and researchers  Pig Latin is the language used  Works on structured, semi-structured and unstructured data  Works on the client side of the cluster Client
  43. 43. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  HiveQL is the language used  Works on structured data. Does not work on other types of data  Works on the server side of the cluster  Hive does not support Avro  Used by programmers and researchers  Pig Latin is the language used  Works on structured, semi-structured and unstructured data  Works on the client side of the cluster  Pig supports Avro
  44. 44. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  HiveQL is the language used  Works on structured data. Does not work on other types of data  Works on the server side of the cluster  Hive does not support Avro  Hive supports partitions  Used by programmers and researchers  Pig Latin is the language used  Works on structured, semi-structured and unstructured data  Works on the client side of the cluster  Pig supports Avro  Pig does not support partitions although there is an option for filtering
  45. 45. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands  Used by analysts  HiveQL is the language used  Works on structured data. Does not work on other types of data  Works on the server side of the cluster  Hive does not support Avro  Hive supports partitions  Hive has web interface  Used by programmers and researchers  Pig Latin is the language used  Works on structured, semi-structured and unstructured data  Works on the client side of the cluster  Pig supports Avro  Pig does not support partitions although there is an option for filtering  Pig does not support web interface
  46. 46. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands • create database database_name // used to create a new database • show databases; //shows the list of existing databases • Now, to create a table inside the database create table table_name(ID INT, Name STRING, DEPT STRING, YOJ INT) row format delimited fields terminated by ‘,’; • show tables; //Gives list of the created table • hive> SELECT round(2.3) from temp; //Rounds off the value to the nearest highest integer -> 2.3 – 2 • hive> SELECT floor(2.3) from temp; //Rounds off any positive or negative decimal value down to the next least integer value -> 2.3 – 2 • hive> SELECT ceil(2.3) from temp; //This function is used to get the smallest integer which is greater than, or equal to, the specified numeric expression -> 2.3 - 3 Few Hive Commands
  47. 47. What is Hive & Pig Data Models Execution Modes Features Need for Hive & Pig HiveQL & Pig Latin Commands • hadoop dfs -put ‘path_name’ /pigInput //For file to be moved into HDFS • pig // To start the grunt shell mode • relation1 = LOAD ‘/pigInput’ USING PigStorage(‘,’) AS (Id:chararray,Name:chararray,Profession:chararray,Age:chararray); //Loads the file from HDFS into Pig • dump relation1; //The results from the previous load command is displayed using dump • relation1_filter = filter relation1 by column_name == ‘attribute_name’; • dump relation1_filter; //Filter command shows the result for that particular filter that we give Few Pig Commands

