PIG
Introduction to PIG
components
Apache Pig
 Pig is tool to for analyzing massive amount of
data.
Pig is a high level language that consists of
compiler that compiles user input into a series of
Map-Reduce programs that allows people to
focus more on analyzing data then spending
time in writing MapReduce Programs.
It actually creates a Java .jar file internally itself
from the user script or the input and runs as a
MapReduce job.
Rupak Roy
 Pig provides options not only to read and
write data from HDFS and can aslo be used
from other sources like local storage.
 Pig have 2 components:
1) Pig Latin
2) Execution Environments
1. The language for this platform is called Pig
Latin that turns the input data into a series of
MapReduce jobs
Rupak Roy
2. Execution environments:
2
* Local mode/execution.
* Distributed mode/execution on hadoop clusters.
Rupak Roy
Execution environments:
 Local Mode/Execution: used for running Pig on your local machine
and not in clusters. So any read/write operations is locally stored in
the local file systems and not in HDFS. It is mainly used for
prototyping and debugging.
To run Pig on Local Machine use the command:
P –x local
 Cluster Mode/Execution: used for running Pig on Hadoop clusters.
To run Pig on Cluster Mode/Execution used to command:
P –x mapreduce
Or
Pig
However, cluster mode is the default mode for any for read/write
operations HDFS is used.
Rupak Roy
Pig Architecture
Rupak Roy
Pig Architecture
 Grunt Shell: is a interactive space mainly used to
write Pig Latin scripts.
 Parser: parser is a interpreter that checks the
structure of the Pig scripts. The output of the parser
is DAG (directed acyclic graph) which represents
the Pig Latin statements and logical operators.
 Optimizer: the DAG is carried out by the optimizer,
which takes care of optimizing the logical
operators.
 Compiler: again the optimizer is carried away by
the compiler to generate a series of MapReduce
jobs.
 Finally MapReduce jobs are executed and the
results are stored in HDFS or locally if in case for
Local Mode.
Rupak Roy
Next
 We will learn the PIG Latin Data Model
with Load and Store functions.
Rupak Roy

Introduction to PIG components

  • 1.
  • 2.
    Apache Pig  Pigis tool to for analyzing massive amount of data. Pig is a high level language that consists of compiler that compiles user input into a series of Map-Reduce programs that allows people to focus more on analyzing data then spending time in writing MapReduce Programs. It actually creates a Java .jar file internally itself from the user script or the input and runs as a MapReduce job. Rupak Roy
  • 3.
     Pig providesoptions not only to read and write data from HDFS and can aslo be used from other sources like local storage.  Pig have 2 components: 1) Pig Latin 2) Execution Environments 1. The language for this platform is called Pig Latin that turns the input data into a series of MapReduce jobs Rupak Roy
  • 4.
    2. Execution environments: 2 *Local mode/execution. * Distributed mode/execution on hadoop clusters. Rupak Roy
  • 5.
    Execution environments:  LocalMode/Execution: used for running Pig on your local machine and not in clusters. So any read/write operations is locally stored in the local file systems and not in HDFS. It is mainly used for prototyping and debugging. To run Pig on Local Machine use the command: P –x local  Cluster Mode/Execution: used for running Pig on Hadoop clusters. To run Pig on Cluster Mode/Execution used to command: P –x mapreduce Or Pig However, cluster mode is the default mode for any for read/write operations HDFS is used. Rupak Roy
  • 6.
  • 7.
    Pig Architecture  GruntShell: is a interactive space mainly used to write Pig Latin scripts.  Parser: parser is a interpreter that checks the structure of the Pig scripts. The output of the parser is DAG (directed acyclic graph) which represents the Pig Latin statements and logical operators.  Optimizer: the DAG is carried out by the optimizer, which takes care of optimizing the logical operators.  Compiler: again the optimizer is carried away by the compiler to generate a series of MapReduce jobs.  Finally MapReduce jobs are executed and the results are stored in HDFS or locally if in case for Local Mode. Rupak Roy
  • 8.
    Next  We willlearn the PIG Latin Data Model with Load and Store functions. Rupak Roy