Big Data - Part IV

 Terminology
› Apache Hive - provides a SQL-like interface (supported with a
language) to data stored in Hadoop
› Apache Pig – provides a Scripting language for data flows in the
data stored in Hadoop
 What is Hive?
 What is Pig?
 Demo

 Provides a SQL-like interface to data stored in Hadoop
 Provides a data workbench where you can examine,
modify and manipulate the data
 Hive is considered friendlier and more familiar to users
who are working on SQL for querying data.
 In general, any task can be done in Pig can be achieved
from Hive as well and vice versa
 But depend on the use case Hive or Pig can result
better in performance than the other
 Hive friendly use-cases (data warehouse type cases):
› business-intelligence analysis
› ad-hoc queries

 Provides a Scripting language for data flows in the data
stored in Hadoop
 Data objects exist and are operated on in the script.
Once the script is complete all data objects are deleted
unless you stored them
 Pig friendly use-cases (data factory type cases):
› data pipelines - bring in a data feed, and clean and transform
› iterative processing – bring in small dataset changes so that the state
of a large dataset changes iteratively
› research – test new theories and hypotheses using script quickly.

 Hive Demo
› Using CDH
› Using HDP
 Pig Demo
› Using CDH
› Using HDP

Big Data - Part IV

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data - Part IV

Similar to Big Data - Part IV (20)

Recently uploaded

Recently uploaded (20)

Big Data - Part IV