Key Takeaways
Key Takeaways
What’s in it for you?
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
What’s in it for you?
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Let’s get started with
Pig!
1. Why Pig?
2. What is Pig?
3. MapReduce vs Hive vs Pig
4. Pig architecture
5. Working of Pig
6. Pig Latin data model
7. Pig Execution modes
8. Use case – Twitter
9. Features of Pig
Key Takeaways
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time
Why Pig?
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time Processing Big Data was faster using Mapreduce
Why Pig?
After
As we all know, Hadoop uses MapReduce to analyze and process big data
Before
Processing Big Data consumed more time Processing Big Data was faster using Mapreduce
Why Pig?
AfterThen, what is the
problem with
MapReduce ?
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Non-programmers found it
difficult to write lengthy Java
codes
They faced issues in
incorporating map, sort,
reduce fundamentals of
MapReduce while creating a
program
Eventually, it became a
difficult task to maintain and
optimize the code due to
which the processing time
increased
Map phase
Shuffle and sort
Reduce phase
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Non-programmers found it
difficult to write lengthy Java
codes
They faced issues in
incorporating map, sort,
reduce fundamentals of
MapReduce while creating a
program
Eventually, it became a
difficult task to maintain and
optimize the code due to
which the processing time
increased
Map phase
Shuffle and sort
Reduce phase
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Non-programmers found it
difficult to write lengthy Java
codes
They faced issues in
incorporating map, sort,
reduce fundamentals of
MapReduce while creating a
program
Eventually, it became a
difficult task to maintain and
optimize the code due to
which the processing time
increased
Map phase
Shuffle and sort
Reduce phase
Prior to 2006, all MapReduce programs were written in Java
Why Pig?
Why Pig?
Yahoo faced problems to process and analyze large
datasets using Java as the codes were complex and
lengthy
Problem
Why Pig?
Yahoo faced problems to process and analyze large
datasets using Java as the codes were complex and
lengthy
There was a necessity to develop an easier way to
analyze large datasets without using time consuming
complex Java codes
Problem
Necessity
Why Pig?
Yahoo faced problems to process and analyze large
datasets using Java as the codes were complex and
lengthy
• Apache Pig was developed by Yahoo researchers.
• It was developed with a vision to analyze and process large
datasets without using complex Java codes. Pig was
developed especially for non-programmers.
• Pig used simple steps to analyze datasets which was time
efficient.
Problem
Necessity
Solution
There was a necessity to develop an easier way to
analyze large datasets without using time consuming
complex Java codes
Key Takeaways
What is Pig?
What is Pig?
Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze
large datasets
What is Pig?
Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze
large datasets
Uses SQL
like queries
Analyze data
What is Pig?
Pig is a scripting platform that runs on Hadoop clusters, designed to process and analyze
large datasets
Pig operates on various types of data like
structured, semi-structured and
unstructured data
Uses SQL
like queries
Analyze data
Key Takeaways
MapReduce vs Hive vs Pig
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
Can process only structured data Can process structured, semi
structured and unstructured data
No need to write complex
codes
How is Blockchain distributed ledger different from a traditional ledger?
SQL like query Scripting language
vs vs
Complied language
MapReduce vs Hive vs Pig
Need to write long complex codes
Lower level of abstraction
No need to write complex
codes
Higher level of abstraction
Can process structured, semi
structured and unstructured data
Higher level of abstraction
No need to write complex
codes
Can process only structured data Can process structured, semi
structured and unstructured data
This is the advantage Pig has over Hive
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
How is Blockchain distributed ledger different from a traditional ledger?
vs vs
Supports partitioning feature
MapReduce vs Hive vs Pig
MapReduce uses Java and
Python
Code performance is good
Hive uses a SQL like query language
known as HiveQL
Code performance is lesser than
MapReduce and Pig
MapReduce is used by programmers
Code performance is lesser than
MapReduce but better than Hive
Hive is used by data analysts Pig is used by researchers and
programmers
Pig Latin is used which is a
procedural data flow language
Supports partitioning feature No concept of partitioning in
Pig
Key Takeaways
Components of Pig
Components of Pig
Pig has two components
Components of Pig
Pig has two components
Runtime engine
Pig Latin
Pig Latin is the procedural data
flow language used in Pig to
analyze data
It is easy to program using Pig
Latin as it is similar to SQL
Runtime engine represents the
execution environment created
to run Pig Latin programs
It is also a compiler that
produces MapReduce
programs
Uses HDFS for storing and
retrieving data
Components of Pig
Pig has two components
Pig Latin
Runtime engine
Pig Latin is the procedural
data flow language used in
Pig to analyze data
It is easy to program using
Pig Latin as it is similar to
SQL
It is also a compiler that produces
MapReduce programs
Uses HDFS for storing and
retrieving data
Runtime engine represents the
execution environment created
to run Pig Latin programs
Key Takeaways
Pig architecture
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Programmers write a script In Pig
Latin to analyze data using Pig
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell
Grunt Shell is Pig’s interactive shell which
is used to execute all Pig scripts
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
If the Pig script is written in a script file, the
execution is done by the Pig Server
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Parser checks the syntax of the Pig script
After checking, the output will be a
DAG – Directed Acyclic Graph
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer DAG (logical plan) is passed to the logical
Optimizer where optimizations take place
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer
Compiler
The Compiler converts the DAG into
MapReduce jobs
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer
Compiler
Execution Engine
The MapReduce jobs are executed at the
Execution Engine
The results are displayed using “DUMP”
statement and stored in HDFS using
“STORE” statement
Pig architecture
There are 3 ways to execute the
written Pig script
Pig Latin Scripts
Grunt Shell Pig Server
Parser
Optimizer
Compiler
Execution Engine
MapReduce
HDFS
Key Takeaways
Working of Pig
Working of Pig
Load data and
write Pig script
Pig Latin script is written
by the users
1
Working of Pig
Load data and
write Pig script
Pig operations In this step, all the Pig
operations are performed by
parser, optimizer and
compiler
21
Working of Pig
Load data and
write Pig script
Pig operations
Execution of the
plan
In this stage, the results are shown
on the screen otherwise stored in
HDFS as per the code
1 2
3
Key Takeaways
Pig Latin data model
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Atom represents any single value of primitive data type in
Pig Latin like int, float, string. It is stored as string
Examples
‘Rob’ or
50
Atom
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Tuple represents sequence of fields that can be of any
data type. It is same as a row in RDBMS i.e.; a set of data
from a single row
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Bag
{(Rob,5),(
Mike,10}
Bag is a collection of tuples. It is the same as a table in
RDBMS. It is represented by ‘{}’
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Bag
{(Rob,5),(
Mike,10}
Map
[name#Mi
ke,
age#10]
Map is a set of key-value pairs. Key is of chararray type
and value can be of any type. It is represented by ‘[]’
Pig Latin data model
The data model of Pig Latin helps Pig to handle various
types of data
Examples
‘Rob’ or
50
Atom Tuple
(Rob,5)
Bag
{(Rob,5),(
Mike,10}
Map
[name#Mi
ke,
age#10]
Map is a set of key-value pairs. Key is of chararray type
and value can be of any type. It is represented by ‘[]’
Pig Latin has a fully nestable data model
that means one data type can be nested
with another
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California
Field
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California Tuple
Pig Latin data model
Here is a diagrammatical representation of the Pig Latin data
model
Sl. no Name Age Place
01
02
03
Jack
Bob
Joe
23
25
29
Goa
London
California
}Bag
Key Takeaways
Pig Execution modes
Pig Execution modes
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Pig Execution modes
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Local Mode MapReduce Mode
Pig Execution modes
Local Mode MapReduce Mode
Here, the Pig engine takes input from the Linux file system and the output is
stored in the same file system
Local Mode is useful in analyzing small datasets using Pig
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Pig Execution modes
Local Mode MapReduce Mode
Pig works in two execution modes. Depending
on where the data is residing and where the Pig
script is going to run
Here, the Pig engine directly interacts and executes in HDFS and
MapReduce
In the MapReduce mode, queries written in Pig Latin are translated into
MapReduce jobs and are run on a Hadoop cluster. By default, Pig runs on
this mode
Pig Execution modes
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
Interactive mode means coding and executing the script, line by line
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
There are three modes in Pig, depending on
how a Pig Latin code can be written
In Batch mode, all scripts are coded in a file with the extension .pig and
the file is directly executed
Pig Execution modes
Interactive Mode
Batch Mode
Embedded Mode
There are three modes in Pig, depending on
how a Pig Latin code can be written
Pig lets it’s users define their own functions (UDFs) in
programming languages such as Java
Key Takeaways
Use case - Twitter
Use case – Twitter
Users on Twitter generate about 500 million tweets
on a daily basis
Use case – Twitter
Users on Twitter generate about 500 million tweets
on a daily basis
Hadoop MapReduce was used to
process and analyze this data
Analyzing the number of tweets created by a user in
the tweet table was done using MapReduce in Java
programming language
Use case – Twitter
Users on Twitter generate about 500 million tweets
on a daily basis
Hadoop MapReduce was used to
process and analyze this data
Analyzing the number of tweets created by a user in
the tweet table was done using MapReduce in Java
programming language
It was difficult to perform MapReduce operations as users
were not well versed with writing complex Java codes
Use case – Twitter
The problems that were faced by Twitter while
analyzing datasets using MapReduce were :
Joining Datasets Sorting
Datasets
Grouping
Datasets
It was difficult to perform these operations on MapReduce as
it consumed more time since the Java codes were lengthy
and complex
Twitter used Apache Pig to overcome
these problems. Let’s see how.
Use case – Twitter
Problem statement
Analyze the user table and tweet table and find out how many tweets
are created by a person
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google….
Tennis…
Spacecraft…
Oscar…
Politics..…
Olympics…
ID Tweet
Problem statement
Analyze the user table and tweet table and find out how many tweets
are created by a person
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google….
Tennis…
Spacecraft…
Oscar…
Politics..…
Olympics…
ID Tweet
Problem statement
The following operations were
performed for analyzing the given data
Analyze the user table and tweet table and find out how many tweets
are created by a person
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google...
Tennis...
Spacecraft...
Oscar...
Politics...
Olympics...
ID Tweet
First, the twitter data is loaded onto the Pig storage using
LOAD command
Use case – Twitter
ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google...
Tennis...
Spacecraft...
Oscar...
Politics...
Olympics...
ID Tweet ID Name
1
2
3
Alice
Tim
John
User Table Tweet Table
1
2
1
3
1
2
Google...
Tennis...
Spacecraft...
Oscar...
Politics...
Olympics...
ID Tweet
First, the twitter data is loaded onto the Pig storage using
LOAD command
Use case – Twitter
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
ID Name Tweet
1
1
2
1
2
3
Alice
Alice
Alice
Tim
Tim
John
Google...
Spacecraft...
Politics...
Tennis...
Oscar...
Olympics...
The remaining operations performed are shown below
Use case – Twitter
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
ID Count
1
2
3
3
2
1
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
The remaining operations performed are shown below
Use case – Twitter
The remaining operations performed are shown below
ID
1
2
3
Name Count
3
2
1
Alice
Tim
John
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
The result after the count operation is joined with the user table to
find out the user name
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
Use case – Twitter
The remaining operations performed are shown below
ID
1
2
3
Name Count
3
2
1
Alice
Tim
John
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
The result after the count operation is joined with the user table to
find out the user name
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
Pig reduces the complexity of
the operations which would
have been lengthier using
MapReduce
Use case – Twitter
The remaining operations performed are shown below
ID
1
2
3
Name Count
3
2
1
Alice
Tim
John
In join and group operation, the tweet and user tables are joined
and grouped using COGROUP command
The result after the count operation is joined with the user table to
find out the user name
The next operation is the aggregation, the tweets are counted
according to the names. The command used is COUNT
Finally, we could find out the
number of tweets created by a
user in a simple way
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Pig lets us create
User-defined
Functions
Handles all kind of data
like structured, semi
structured and
unstructured
Short development
time as the code is
simpler
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Handles all kind of data
like structured, semi
structured and
unstructured
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Pig offers a large
set of operators
such as join, filter
and so on
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Allows multiple
queries to process
parallelly
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig offers a large
set of operators
such as join, filter
and so on
Pig lets us create
User-defined
Functions
Optimization and
compilation is easy
as it is done
automatically and
internally
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig offers a large
set of operators
such as join, filter
and so on
Allows multiple
queries to process
parallelly
Pig lets us create
User-defined
Functions
Features of Pig
Ease of programming
as Pig Latin is similar
to SQL. Lesser lines
of code needs to be
written
Short development
time as the code is
simpler
Handles all kind of data
like structured, semi
structured and
unstructured
Pig offers a large
set of operators
such as join, filter
and so on
Allows multiple
queries to process
parallelly
Optimization and
compilation is easy
as it is done
automatically and
internally
Pig lets us create
User-defined
Functions
Demo
Key Takeaways
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Architecture | Simplilearn

Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Architecture | Simplilearn

  • 1.
  • 2.
  • 3.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig?
  • 4.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig?
  • 5.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig
  • 6.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture
  • 7.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig
  • 8.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model
  • 9.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes
  • 10.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter
  • 11.
    What’s in itfor you? 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig Let’s get started with Pig! 1. Why Pig? 2. What is Pig? 3. MapReduce vs Hive vs Pig 4. Pig architecture 5. Working of Pig 6. Pig Latin data model 7. Pig Execution modes 8. Use case – Twitter 9. Features of Pig
  • 12.
  • 13.
    As we allknow, Hadoop uses MapReduce to analyze and process big data Why Pig?
  • 14.
    As we allknow, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Why Pig?
  • 15.
    As we allknow, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Why Pig?
  • 16.
    As we allknow, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Processing Big Data was faster using Mapreduce Why Pig? After
  • 17.
    As we allknow, Hadoop uses MapReduce to analyze and process big data Before Processing Big Data consumed more time Processing Big Data was faster using Mapreduce Why Pig? AfterThen, what is the problem with MapReduce ?
  • 18.
    Prior to 2006,all MapReduce programs were written in Java Why Pig?
  • 19.
    Non-programmers found it difficultto write lengthy Java codes They faced issues in incorporating map, sort, reduce fundamentals of MapReduce while creating a program Eventually, it became a difficult task to maintain and optimize the code due to which the processing time increased Map phase Shuffle and sort Reduce phase Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 20.
    Non-programmers found it difficultto write lengthy Java codes They faced issues in incorporating map, sort, reduce fundamentals of MapReduce while creating a program Eventually, it became a difficult task to maintain and optimize the code due to which the processing time increased Map phase Shuffle and sort Reduce phase Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 21.
    Non-programmers found it difficultto write lengthy Java codes They faced issues in incorporating map, sort, reduce fundamentals of MapReduce while creating a program Eventually, it became a difficult task to maintain and optimize the code due to which the processing time increased Map phase Shuffle and sort Reduce phase Prior to 2006, all MapReduce programs were written in Java Why Pig?
  • 22.
    Why Pig? Yahoo facedproblems to process and analyze large datasets using Java as the codes were complex and lengthy Problem
  • 23.
    Why Pig? Yahoo facedproblems to process and analyze large datasets using Java as the codes were complex and lengthy There was a necessity to develop an easier way to analyze large datasets without using time consuming complex Java codes Problem Necessity
  • 24.
    Why Pig? Yahoo facedproblems to process and analyze large datasets using Java as the codes were complex and lengthy • Apache Pig was developed by Yahoo researchers. • It was developed with a vision to analyze and process large datasets without using complex Java codes. Pig was developed especially for non-programmers. • Pig used simple steps to analyze datasets which was time efficient. Problem Necessity Solution There was a necessity to develop an easier way to analyze large datasets without using time consuming complex Java codes
  • 25.
  • 26.
    What is Pig? Pigis a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets
  • 27.
    What is Pig? Pigis a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets Uses SQL like queries Analyze data
  • 28.
    What is Pig? Pigis a scripting platform that runs on Hadoop clusters, designed to process and analyze large datasets Pig operates on various types of data like structured, semi-structured and unstructured data Uses SQL like queries Analyze data
  • 29.
  • 30.
    How is Blockchaindistributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 31.
    How is Blockchaindistributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 32.
    How is Blockchaindistributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 33.
    How is Blockchaindistributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 34.
    How is Blockchaindistributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction Can process only structured data Can process structured, semi structured and unstructured data No need to write complex codes
  • 35.
    How is Blockchaindistributed ledger different from a traditional ledger? SQL like query Scripting language vs vs Complied language MapReduce vs Hive vs Pig Need to write long complex codes Lower level of abstraction No need to write complex codes Higher level of abstraction Can process structured, semi structured and unstructured data Higher level of abstraction No need to write complex codes Can process only structured data Can process structured, semi structured and unstructured data This is the advantage Pig has over Hive
  • 36.
    How is Blockchaindistributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 37.
    How is Blockchaindistributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 38.
    How is Blockchaindistributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 39.
    How is Blockchaindistributed ledger different from a traditional ledger? vs vs Supports partitioning feature MapReduce vs Hive vs Pig MapReduce uses Java and Python Code performance is good Hive uses a SQL like query language known as HiveQL Code performance is lesser than MapReduce and Pig MapReduce is used by programmers Code performance is lesser than MapReduce but better than Hive Hive is used by data analysts Pig is used by researchers and programmers Pig Latin is used which is a procedural data flow language Supports partitioning feature No concept of partitioning in Pig
  • 40.
  • 41.
    Components of Pig Pighas two components
  • 42.
    Components of Pig Pighas two components Runtime engine Pig Latin Pig Latin is the procedural data flow language used in Pig to analyze data It is easy to program using Pig Latin as it is similar to SQL Runtime engine represents the execution environment created to run Pig Latin programs It is also a compiler that produces MapReduce programs Uses HDFS for storing and retrieving data
  • 43.
    Components of Pig Pighas two components Pig Latin Runtime engine Pig Latin is the procedural data flow language used in Pig to analyze data It is easy to program using Pig Latin as it is similar to SQL It is also a compiler that produces MapReduce programs Uses HDFS for storing and retrieving data Runtime engine represents the execution environment created to run Pig Latin programs
  • 44.
  • 45.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Programmers write a script In Pig Latin to analyze data using Pig
  • 46.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Grunt Shell is Pig’s interactive shell which is used to execute all Pig scripts
  • 47.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server If the Pig script is written in a script file, the execution is done by the Pig Server
  • 48.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Parser checks the syntax of the Pig script After checking, the output will be a DAG – Directed Acyclic Graph
  • 49.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer DAG (logical plan) is passed to the logical Optimizer where optimizations take place
  • 50.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer Compiler The Compiler converts the DAG into MapReduce jobs
  • 51.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer Compiler Execution Engine The MapReduce jobs are executed at the Execution Engine The results are displayed using “DUMP” statement and stored in HDFS using “STORE” statement
  • 52.
    Pig architecture There are3 ways to execute the written Pig script Pig Latin Scripts Grunt Shell Pig Server Parser Optimizer Compiler Execution Engine MapReduce HDFS
  • 53.
  • 54.
    Working of Pig Loaddata and write Pig script Pig Latin script is written by the users 1
  • 55.
    Working of Pig Loaddata and write Pig script Pig operations In this step, all the Pig operations are performed by parser, optimizer and compiler 21
  • 56.
    Working of Pig Loaddata and write Pig script Pig operations Execution of the plan In this stage, the results are shown on the screen otherwise stored in HDFS as per the code 1 2 3
  • 57.
  • 58.
    Pig Latin datamodel The data model of Pig Latin helps Pig to handle various types of data
  • 59.
    Pig Latin datamodel The data model of Pig Latin helps Pig to handle various types of data Atom represents any single value of primitive data type in Pig Latin like int, float, string. It is stored as string Examples ‘Rob’ or 50 Atom
  • 60.
    Pig Latin datamodel The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Tuple represents sequence of fields that can be of any data type. It is same as a row in RDBMS i.e.; a set of data from a single row
  • 61.
    Pig Latin datamodel The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Bag {(Rob,5),( Mike,10} Bag is a collection of tuples. It is the same as a table in RDBMS. It is represented by ‘{}’
  • 62.
    Pig Latin datamodel The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Bag {(Rob,5),( Mike,10} Map [name#Mi ke, age#10] Map is a set of key-value pairs. Key is of chararray type and value can be of any type. It is represented by ‘[]’
  • 63.
    Pig Latin datamodel The data model of Pig Latin helps Pig to handle various types of data Examples ‘Rob’ or 50 Atom Tuple (Rob,5) Bag {(Rob,5),( Mike,10} Map [name#Mi ke, age#10] Map is a set of key-value pairs. Key is of chararray type and value can be of any type. It is represented by ‘[]’ Pig Latin has a fully nestable data model that means one data type can be nested with another
  • 64.
    Pig Latin datamodel Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California
  • 65.
    Pig Latin datamodel Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California Field
  • 66.
    Pig Latin datamodel Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California Tuple
  • 67.
    Pig Latin datamodel Here is a diagrammatical representation of the Pig Latin data model Sl. no Name Age Place 01 02 03 Jack Bob Joe 23 25 29 Goa London California }Bag
  • 68.
  • 69.
    Pig Execution modes Pigworks in two execution modes. Depending on where the data is residing and where the Pig script is going to run
  • 70.
    Pig Execution modes Pigworks in two execution modes. Depending on where the data is residing and where the Pig script is going to run Local Mode MapReduce Mode
  • 71.
    Pig Execution modes LocalMode MapReduce Mode Here, the Pig engine takes input from the Linux file system and the output is stored in the same file system Local Mode is useful in analyzing small datasets using Pig Pig works in two execution modes. Depending on where the data is residing and where the Pig script is going to run
  • 72.
    Pig Execution modes LocalMode MapReduce Mode Pig works in two execution modes. Depending on where the data is residing and where the Pig script is going to run Here, the Pig engine directly interacts and executes in HDFS and MapReduce In the MapReduce mode, queries written in Pig Latin are translated into MapReduce jobs and are run on a Hadoop cluster. By default, Pig runs on this mode
  • 73.
    Pig Execution modes Thereare three modes in Pig, depending on how a Pig Latin code can be written
  • 74.
    Pig Execution modes InteractiveMode Batch Mode Embedded Mode There are three modes in Pig, depending on how a Pig Latin code can be written
  • 75.
    Pig Execution modes InteractiveMode Batch Mode Embedded Mode Interactive mode means coding and executing the script, line by line There are three modes in Pig, depending on how a Pig Latin code can be written
  • 76.
    Pig Execution modes InteractiveMode Batch Mode Embedded Mode There are three modes in Pig, depending on how a Pig Latin code can be written In Batch mode, all scripts are coded in a file with the extension .pig and the file is directly executed
  • 77.
    Pig Execution modes InteractiveMode Batch Mode Embedded Mode There are three modes in Pig, depending on how a Pig Latin code can be written Pig lets it’s users define their own functions (UDFs) in programming languages such as Java
  • 78.
  • 79.
    Use case –Twitter Users on Twitter generate about 500 million tweets on a daily basis
  • 80.
    Use case –Twitter Users on Twitter generate about 500 million tweets on a daily basis Hadoop MapReduce was used to process and analyze this data Analyzing the number of tweets created by a user in the tweet table was done using MapReduce in Java programming language
  • 81.
    Use case –Twitter Users on Twitter generate about 500 million tweets on a daily basis Hadoop MapReduce was used to process and analyze this data Analyzing the number of tweets created by a user in the tweet table was done using MapReduce in Java programming language It was difficult to perform MapReduce operations as users were not well versed with writing complex Java codes
  • 82.
    Use case –Twitter The problems that were faced by Twitter while analyzing datasets using MapReduce were : Joining Datasets Sorting Datasets Grouping Datasets It was difficult to perform these operations on MapReduce as it consumed more time since the Java codes were lengthy and complex Twitter used Apache Pig to overcome these problems. Let’s see how.
  • 83.
    Use case –Twitter Problem statement Analyze the user table and tweet table and find out how many tweets are created by a person
  • 84.
    Use case –Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google…. Tennis… Spacecraft… Oscar… Politics..… Olympics… ID Tweet Problem statement Analyze the user table and tweet table and find out how many tweets are created by a person
  • 85.
    Use case –Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google…. Tennis… Spacecraft… Oscar… Politics..… Olympics… ID Tweet Problem statement The following operations were performed for analyzing the given data Analyze the user table and tweet table and find out how many tweets are created by a person
  • 86.
    Use case –Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google... Tennis... Spacecraft... Oscar... Politics... Olympics... ID Tweet First, the twitter data is loaded onto the Pig storage using LOAD command
  • 87.
    Use case –Twitter ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google... Tennis... Spacecraft... Oscar... Politics... Olympics... ID Tweet ID Name 1 2 3 Alice Tim John User Table Tweet Table 1 2 1 3 1 2 Google... Tennis... Spacecraft... Oscar... Politics... Olympics... ID Tweet First, the twitter data is loaded onto the Pig storage using LOAD command
  • 88.
    Use case –Twitter In join and group operation, the tweet and user tables are joined and grouped using COGROUP command ID Name Tweet 1 1 2 1 2 3 Alice Alice Alice Tim Tim John Google... Spacecraft... Politics... Tennis... Oscar... Olympics... The remaining operations performed are shown below
  • 89.
    Use case –Twitter In join and group operation, the tweet and user tables are joined and grouped using COGROUP command ID Count 1 2 3 3 2 1 The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT The remaining operations performed are shown below
  • 90.
    Use case –Twitter The remaining operations performed are shown below ID 1 2 3 Name Count 3 2 1 Alice Tim John In join and group operation, the tweet and user tables are joined and grouped using COGROUP command The result after the count operation is joined with the user table to find out the user name The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT
  • 91.
    Use case –Twitter The remaining operations performed are shown below ID 1 2 3 Name Count 3 2 1 Alice Tim John In join and group operation, the tweet and user tables are joined and grouped using COGROUP command The result after the count operation is joined with the user table to find out the user name The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT Pig reduces the complexity of the operations which would have been lengthier using MapReduce
  • 92.
    Use case –Twitter The remaining operations performed are shown below ID 1 2 3 Name Count 3 2 1 Alice Tim John In join and group operation, the tweet and user tables are joined and grouped using COGROUP command The result after the count operation is joined with the user table to find out the user name The next operation is the aggregation, the tweets are counted according to the names. The command used is COUNT Finally, we could find out the number of tweets created by a user in a simple way
  • 93.
    Optimization and compilation iseasy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Pig lets us create User-defined Functions Handles all kind of data like structured, semi structured and unstructured Short development time as the code is simpler Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written
  • 94.
    Optimization and compilation iseasy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Handles all kind of data like structured, semi structured and unstructured Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Pig lets us create User-defined Functions
  • 95.
    Optimization and compilation iseasy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig lets us create User-defined Functions
  • 96.
    Optimization and compilation iseasy as it is done automatically and internally Allows multiple queries to process parallelly Pig offers a large set of operators such as join, filter and so on Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig lets us create User-defined Functions
  • 97.
    Optimization and compilation iseasy as it is done automatically and internally Allows multiple queries to process parallelly Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig offers a large set of operators such as join, filter and so on Pig lets us create User-defined Functions
  • 98.
    Optimization and compilation iseasy as it is done automatically and internally Features of Pig Ease of programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig offers a large set of operators such as join, filter and so on Allows multiple queries to process parallelly Pig lets us create User-defined Functions
  • 99.
    Features of Pig Easeof programming as Pig Latin is similar to SQL. Lesser lines of code needs to be written Short development time as the code is simpler Handles all kind of data like structured, semi structured and unstructured Pig offers a large set of operators such as join, filter and so on Allows multiple queries to process parallelly Optimization and compilation is easy as it is done automatically and internally Pig lets us create User-defined Functions
  • 100.
  • 101.

Editor's Notes