SlideShare a Scribd company logo
UNIT-4
Apache Pig
 Pig is a high-level programming language
useful for analyzing large data sets. Pig
was a result of development effort at
Yahoo!
 In a MapReduce framework, programs
need to be translated into a series of Map
and Reduce stages. However, this is not a
programming model which data analysts
are familiar with. So, in order to bridge this
gap, an abstraction called Pig was built on
top of Hadoop.
 Apache Pig enables people to focus more
on analyzing bulk data sets and to
spend less time writing Map-Reduce
programs. Similar to Pigs, who eat
anything, the Apache Pig programming
language is designed to work upon any
kind of data. That's why the name, Pig!
Pig Architecture
The Architecture of Pig consists of two components:
 Pig Latin, which is a language
 A runtime environment, for running PigLatin programs.
A Pig Latin program consists of a series of operations or
transformations which are applied to the input data to produce
output. These operations describe a data flow which is
translated into an executable representation, by Hadoop Pig
execution environment. Underneath, results of these
transformations are series of MapReduce jobs which a
programmer is unaware of. So, in a way, Pig in Hadoop allows
the programmer to focus on data rather than the nature of
execution.
PigLatin is a relatively stiffened language which uses familiar
keywords from data processing e.g., Join, Group and Filter.
Apache Pig Architecture in Hadoop
 Apache Pig architecture consists of a Pig Latin
interpreter that uses Pig Latin scripts to process and
analyze massive datasets. Programmers use Pig
Latin language to analyze large datasets in
the Hadoop environment. Apache pig has a rich set
of datasets for performing different data operations
like join, filter, sort, load, group, etc.
 Programmers must use Pig Latin language to write a
Pig script to perform a specific task. Pig converts
these Pig scripts into a series of Map-Reduce jobs to
ease programmers’ work. Pig Latin programs are
executed via various mechanisms such as UDFs,
embedded, and Grunt shells.

Apache Pig architecture is consisting of the following
major components:
 Parser
 Optimizer
 Compiler
 Execution Engine
 Execution Mode
Pig Latin Scripts
 Pig scripts are submitted to the Pig execution
environment to produce the desired results.
You can execute the Pig scripts by using one of
the methods:
 Grunt Shell
 Script file
 Embedded script
Parser
Parser handles all the Pig Latin statements or commands. Parser performs several checks on the Pig
statements like syntax check, type check, and generates a DAG (Directed Acyclic Graph) output.
DAG output represents all the logical operators of the scripts as nodes and data flow as edges.
Optimizer
Once parsing operation is completed and a DAG output is generated, the output is passed to the
optimizer. The optimizer then performs the optimization activities on the output, such as split, merge,
projection, pushdown, transform, and reorder, etc. The optimizer processes the extracted data and
omits unnecessary data or columns by performing pushdown and projection activity and improves
query performance.
Compiler
The compiler compiles the output that is generated by the optimizer into a series of Map Reduce jobs.
The compiler automatically converts Pig jobs into Map Reduce jobs and optimizes performance by
rearranging the execution order.
Execution Engine
After performing all the above operations, these Map Reduce jobs are submitted to the execution engine,
which is then executed on the Hadoop platform to produce the desired results. You can then use the
DUMP statement to display the results on screen or STORE statements to store the results
in HDFS (Hadoop Distributed File System).
Execution Mode
Apache Pig is executed in two execution modes that are local and Map Reduce. The choice of execution
mode depends on where the data is stored and where you want to run the Pig script. You can either
store your data locally (in a single machine) or in a distributed Hadoop cluster environment.
Local Mode – You can use local mode if your dataset is small. In local mode, Pig runs in a single JVM
using the local host and file system. In this mode, parallel mapper execution is impossible as all files
are installed and run on the localhost. You can use pig -x local command to specify the local mode.
Map Reduce Mode – Apache Pig uses the Map Reduce mode by default. In Map Reduce mode, a
programmer executes the Pig Latin statements on data that is already stored in the HDFS (Hadoop
Distributed File System). You can use pig -x mapreduce command to specify the Map-Reduce
mode.
Apache Pig Components
Parser
Initially the Pig Scripts are handled by the Parser. It checks the syntax
of the script, does type checking, and other miscellaneous checks.
The output of the parser will be a DAG (directed acyclic graph),
which represents the Pig Latin statements and logical operators.
In the DAG, the logical operators of the script are represented as the
nodes and the data flows are represented as edges.
Optimizer
The logical plan (DAG) is passed to the logical optimizer, which carries
out the logical optimizations such as projection and pushdown.
Compiler
The compiler compiles the optimized logical plan into a series of
MapReduce jobs.
Execution engine
Finally the MapReduce jobs are submitted to Hadoop in a sorted order.
Finally, these MapReduce jobs are executed on Hadoop producing
the desired results.
Pig Latin Data Model
 The data model of Pig Latin is fully nested and it
allows complex non-atomic datatypes such
as map and tuple. Given below is the
diagrammatical representation of Pig Latin’s data
model.
 Atom
Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string
and can be used as string and number. int, long, float, double, chararray, and bytearray are the
atomic values of Pig. A piece of data or a simple atomic value is known as a field.
Example − ‘raja’ or ‘30’
 Tuple
A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A
tuple is similar to a row in a table of RDBMS.
Example − (Raja, 30)
 Bag
A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a
bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is
similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple
contain the same number of fields or that the fields in the same position (column) have the same
type.
Example − {(Raja, 30), (Mohammad, 45)}
A bag can be a field in a relation; in that context, it is known as inner bag.
Example − {Raja, 30, {9848022338, raja@gmail.com,}}
 Map
A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be
unique. The value might be of any type. It is represented by ‘[]’
Example − [name#Raja, age#30]
 Relation
A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples
are processed in any particular order).
Map Reduce vs. Apache Pig
Apache Pig Map Reduce
Scripting language Compiled language
Provides a higher level of
abstraction
Provides a low level of abstraction
Requires a few lines of code (10
lines of code can summarize 200
lines of Map Reduce code)
Requires a more extensive code
(more lines of code)
Requires less development time
and effort
Requires more development time
and effort
Lesser code efficiency
Higher efficiency of code in
comparison to Apache Pig
Apache Pig Features
 Allows programmers to write fewer lines of codes.
Programmers can write 200 lines of Java code in only
ten lines using the Pig Latin language.
 Apache Pig multi-query approach reduces the
development time.
 Apache pig has a rich set of datasets for performing
operations like join, filter, sort, load, group, etc.
 Pig Latin language is very similar to SQL.
Programmers with good SQL knowledge find it easy
to write Pig script.
 Allows programmers to write fewer lines of codes.
Programmers can write 200 lines of Java code in only
ten lines using the Pig Latin language.
 Apache Pig handles both structured and unstructured
data analysis.
Apache Pig Applications
 Processes large volume of data
 Supports quick prototyping and ad-hoc queries
across large datasets
 Performs data processing in search platforms
 Processes time-sensitive data loads
 Used by telecom companies to de-identify the
user call data information.
 https://www.geeksforgeeks.org/apache-pig-
installation-on-windows-and-case-study/

More Related Content

What's hot

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
Dushhyant Kumar
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Simplilearn
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Josef A. Habdank
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
NoSql
NoSqlNoSql
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
Spark Summit
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 

What's hot (20)

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
NoSql
NoSqlNoSql
NoSql
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Similar to Unit 4-apache pig

Apache pig
Apache pigApache pig
Apache pig
Sadiq Basha
 
Apache PIG
Apache PIGApache PIG
Apache PIG
Anuja Gunale
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
vishal choudhary
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
Rupak Roy
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
Sachin Vakkund
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramViswanath Gangavaram
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
DrPDShebaKeziaMalarc
 
pig.ppt
pig.pptpig.ppt
pig.ppt
Sheba41
 
Pig
PigPig
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptlecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
YashJadhav496388
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
stratapps
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
Triloki Gupta
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
Aasim Naveed
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 

Similar to Unit 4-apache pig (20)

Apache pig
Apache pigApache pig
Apache pig
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
 
pig.ppt
pig.pptpig.ppt
pig.ppt
 
Pig
PigPig
Pig
 
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptlecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Introduction to pig.
Introduction to pig.Introduction to pig.
Introduction to pig.
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 

More from vishal choudhary

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
vishal choudhary
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
vishal choudhary
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
vishal choudhary
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
vishal choudhary
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
vishal choudhary
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
vishal choudhary
 
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
vishal choudhary
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
vishal choudhary
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
vishal choudhary
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
vishal choudhary
 
SE1.ppt
SE1.pptSE1.ppt
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
vishal choudhary
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
vishal choudhary
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
vishal choudhary
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
vishal choudhary
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
vishal choudhary
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
vishal choudhary
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
vishal choudhary
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
vishal choudhary
 

More from vishal choudhary (20)

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
 
SE1.ppt
SE1.pptSE1.ppt
SE1.ppt
 
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 

Unit 4-apache pig

  • 2.  Pig is a high-level programming language useful for analyzing large data sets. Pig was a result of development effort at Yahoo!  In a MapReduce framework, programs need to be translated into a series of Map and Reduce stages. However, this is not a programming model which data analysts are familiar with. So, in order to bridge this gap, an abstraction called Pig was built on top of Hadoop.
  • 3.  Apache Pig enables people to focus more on analyzing bulk data sets and to spend less time writing Map-Reduce programs. Similar to Pigs, who eat anything, the Apache Pig programming language is designed to work upon any kind of data. That's why the name, Pig!
  • 4. Pig Architecture The Architecture of Pig consists of two components:  Pig Latin, which is a language  A runtime environment, for running PigLatin programs. A Pig Latin program consists of a series of operations or transformations which are applied to the input data to produce output. These operations describe a data flow which is translated into an executable representation, by Hadoop Pig execution environment. Underneath, results of these transformations are series of MapReduce jobs which a programmer is unaware of. So, in a way, Pig in Hadoop allows the programmer to focus on data rather than the nature of execution. PigLatin is a relatively stiffened language which uses familiar keywords from data processing e.g., Join, Group and Filter.
  • 5. Apache Pig Architecture in Hadoop  Apache Pig architecture consists of a Pig Latin interpreter that uses Pig Latin scripts to process and analyze massive datasets. Programmers use Pig Latin language to analyze large datasets in the Hadoop environment. Apache pig has a rich set of datasets for performing different data operations like join, filter, sort, load, group, etc.  Programmers must use Pig Latin language to write a Pig script to perform a specific task. Pig converts these Pig scripts into a series of Map-Reduce jobs to ease programmers’ work. Pig Latin programs are executed via various mechanisms such as UDFs, embedded, and Grunt shells. 
  • 6. Apache Pig architecture is consisting of the following major components:  Parser  Optimizer  Compiler  Execution Engine  Execution Mode
  • 7. Pig Latin Scripts  Pig scripts are submitted to the Pig execution environment to produce the desired results. You can execute the Pig scripts by using one of the methods:  Grunt Shell  Script file  Embedded script
  • 8. Parser Parser handles all the Pig Latin statements or commands. Parser performs several checks on the Pig statements like syntax check, type check, and generates a DAG (Directed Acyclic Graph) output. DAG output represents all the logical operators of the scripts as nodes and data flow as edges. Optimizer Once parsing operation is completed and a DAG output is generated, the output is passed to the optimizer. The optimizer then performs the optimization activities on the output, such as split, merge, projection, pushdown, transform, and reorder, etc. The optimizer processes the extracted data and omits unnecessary data or columns by performing pushdown and projection activity and improves query performance. Compiler The compiler compiles the output that is generated by the optimizer into a series of Map Reduce jobs. The compiler automatically converts Pig jobs into Map Reduce jobs and optimizes performance by rearranging the execution order. Execution Engine After performing all the above operations, these Map Reduce jobs are submitted to the execution engine, which is then executed on the Hadoop platform to produce the desired results. You can then use the DUMP statement to display the results on screen or STORE statements to store the results in HDFS (Hadoop Distributed File System). Execution Mode Apache Pig is executed in two execution modes that are local and Map Reduce. The choice of execution mode depends on where the data is stored and where you want to run the Pig script. You can either store your data locally (in a single machine) or in a distributed Hadoop cluster environment. Local Mode – You can use local mode if your dataset is small. In local mode, Pig runs in a single JVM using the local host and file system. In this mode, parallel mapper execution is impossible as all files are installed and run on the localhost. You can use pig -x local command to specify the local mode. Map Reduce Mode – Apache Pig uses the Map Reduce mode by default. In Map Reduce mode, a programmer executes the Pig Latin statements on data that is already stored in the HDFS (Hadoop Distributed File System). You can use pig -x mapreduce command to specify the Map-Reduce mode.
  • 9.
  • 10. Apache Pig Components Parser Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking, and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators. In the DAG, the logical operators of the script are represented as the nodes and the data flows are represented as edges. Optimizer The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown. Compiler The compiler compiles the optimized logical plan into a series of MapReduce jobs. Execution engine Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.
  • 11. Pig Latin Data Model  The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such as map and tuple. Given below is the diagrammatical representation of Pig Latin’s data model.
  • 12.  Atom Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig. A piece of data or a simple atomic value is known as a field. Example − ‘raja’ or ‘30’  Tuple A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS. Example − (Raja, 30)  Bag A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in the same position (column) have the same type. Example − {(Raja, 30), (Mohammad, 45)} A bag can be a field in a relation; in that context, it is known as inner bag. Example − {Raja, 30, {9848022338, raja@gmail.com,}}  Map A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be unique. The value might be of any type. It is represented by ‘[]’ Example − [name#Raja, age#30]  Relation A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order).
  • 13. Map Reduce vs. Apache Pig Apache Pig Map Reduce Scripting language Compiled language Provides a higher level of abstraction Provides a low level of abstraction Requires a few lines of code (10 lines of code can summarize 200 lines of Map Reduce code) Requires a more extensive code (more lines of code) Requires less development time and effort Requires more development time and effort Lesser code efficiency Higher efficiency of code in comparison to Apache Pig
  • 14. Apache Pig Features  Allows programmers to write fewer lines of codes. Programmers can write 200 lines of Java code in only ten lines using the Pig Latin language.  Apache Pig multi-query approach reduces the development time.  Apache pig has a rich set of datasets for performing operations like join, filter, sort, load, group, etc.  Pig Latin language is very similar to SQL. Programmers with good SQL knowledge find it easy to write Pig script.  Allows programmers to write fewer lines of codes. Programmers can write 200 lines of Java code in only ten lines using the Pig Latin language.  Apache Pig handles both structured and unstructured data analysis.
  • 15. Apache Pig Applications  Processes large volume of data  Supports quick prototyping and ad-hoc queries across large datasets  Performs data processing in search platforms  Processes time-sensitive data loads  Used by telecom companies to de-identify the user call data information.