By Thanuja Seneviratne
 Terminology
› Apache Hive - provides a SQL-like interface (supported with a
language) to data stored in Hadoop
› Apache Pig – provides a Scripting language for data flows in the
data stored in Hadoop
 What is Hive?
 What is Pig?
 Demo
 Provides a SQL-like interface to data stored in Hadoop
 Provides a data workbench where you can examine,
modify and manipulate the data
 Hive is considered friendlier and more familiar to users
who are working on SQL for querying data.
 In general, any task can be done in Pig can be achieved
from Hive as well and vice versa
 But depend on the use case Hive or Pig can result
better in performance than the other
 Hive friendly use-cases (data warehouse type cases):
› business-intelligence analysis
› ad-hoc queries
 Provides a Scripting language for data flows in the data
stored in Hadoop
 Data objects exist and are operated on in the script.
Once the script is complete all data objects are deleted
unless you stored them
 Pig friendly use-cases (data factory type cases):
› data pipelines - bring in a data feed, and clean and transform
› iterative processing – bring in small dataset changes so that the state
of a large dataset changes iteratively
› research – test new theories and hypotheses using script quickly.
 Hive Demo
› Using CDH
› Using HDP
 Pig Demo
› Using CDH
› Using HDP
Big Data - Part IV

Big Data - Part IV

  • 1.
  • 2.
     Terminology › ApacheHive - provides a SQL-like interface (supported with a language) to data stored in Hadoop › Apache Pig – provides a Scripting language for data flows in the data stored in Hadoop  What is Hive?  What is Pig?  Demo
  • 3.
     Provides aSQL-like interface to data stored in Hadoop  Provides a data workbench where you can examine, modify and manipulate the data  Hive is considered friendlier and more familiar to users who are working on SQL for querying data.  In general, any task can be done in Pig can be achieved from Hive as well and vice versa  But depend on the use case Hive or Pig can result better in performance than the other  Hive friendly use-cases (data warehouse type cases): › business-intelligence analysis › ad-hoc queries
  • 4.
     Provides aScripting language for data flows in the data stored in Hadoop  Data objects exist and are operated on in the script. Once the script is complete all data objects are deleted unless you stored them  Pig friendly use-cases (data factory type cases): › data pipelines - bring in a data feed, and clean and transform › iterative processing – bring in small dataset changes so that the state of a large dataset changes iteratively › research – test new theories and hypotheses using script quickly.
  • 5.
     Hive Demo ›Using CDH › Using HDP  Pig Demo › Using CDH › Using HDP