2. Terminology
› Apache Hive - provides a SQL-like interface (supported with a
language) to data stored in Hadoop
› Apache Pig – provides a Scripting language for data flows in the
data stored in Hadoop
What is Hive?
What is Pig?
Demo
3. Provides a SQL-like interface to data stored in Hadoop
Provides a data workbench where you can examine,
modify and manipulate the data
Hive is considered friendlier and more familiar to users
who are working on SQL for querying data.
In general, any task can be done in Pig can be achieved
from Hive as well and vice versa
But depend on the use case Hive or Pig can result
better in performance than the other
Hive friendly use-cases (data warehouse type cases):
› business-intelligence analysis
› ad-hoc queries
4. Provides a Scripting language for data flows in the data
stored in Hadoop
Data objects exist and are operated on in the script.
Once the script is complete all data objects are deleted
unless you stored them
Pig friendly use-cases (data factory type cases):
› data pipelines - bring in a data feed, and clean and transform
› iterative processing – bring in small dataset changes so that the state
of a large dataset changes iteratively
› research – test new theories and hypotheses using script quickly.
5. Hive Demo
› Using CDH
› Using HDP
Pig Demo
› Using CDH
› Using HDP