SlideShare a Scribd company logo
1 of 39
PIG
C P Madumathi
Sri Krishna college of arts and science
 Called DATA FLOW LANGUAGE.
 Used as a SCRIPTING LANGUAGE in Big Data
Technology.
 Executes through HDFS.
 HDFS is based on GOOGLE FILE SYSTEM(GFS).
IS A
Can handle large dataset
I LUV 2 EAT MORE N MORE
PIG WORKS ON HADOOP ENVIRONMENT
Even in a local mode
• Not preferred for data analytics
• 200 LOC = 10 LOC
• Not rich in Built-in-functions
Local mode
Hadoop mode
 Pig system does two tasks:
Logical Plan
Physical Plan
◦ Builds a Logical Plan from a Pig Latin script
◦ Supports execution platform independence
◦ No processing of data performed at this stage
Compiles the Logical Plan to a Physical Plan and
Executes
◦ Convert the Logical Plan into a series of Map-Reduce
statements to be executed by Hadoop Map-Reduce
A = LOAD ‘dataset 1.dat’AS (name, dob,
designation);
B = GROUP A BY designation;
C = FOREACH B GENERATE group AS dob,
COUNT(A);
D = FILTER C BY name IS ‘XXX’
OR name IS ‘yyy’;
STORE D INTO ‘result.dat’;
LOAD DATA
A = LOAD ‘dataset 1.dat’AS (name, dob,
designation);
B = GROUP A BY designation;
C = FOREACH B GENERATE group AS dob,
COUNT(A);
D = FILTER C BY name IS ‘XXX’
OR name IS ‘yyy’;
STORE D INTO ‘result.dat
LOAD DATA
GROUP DATA
A = LOAD ‘dataset 1.dat’AS (name, dob,
designation);
B = GROUP A BY designation;
C = FOREACH B GENERATE group AS dob,
COUNT(A);
D = FILTER C BY name IS ‘XXX’
OR name IS ‘yyy’;
STORE D INTO ‘result.dat’;
LOAD DATA
GROUP DATA
FOREACH
A = LOAD ‘dataset 1.dat’AS (name, dob, designation);
B = GROUP A BY designation;
C = FOREACH B GENERATE group AS dob,
COUNT(A);
D = FILTER C BY name IS ‘XXX’
OR name IS ‘yyy’;
STORE D INTO ‘result.dat’;
LOAD DATA
GROUP DATA
FOREACH
FILTER
 Linux above 10
 Java above 6
 Hadoop
 Pig
 Pig Latin is a data flow language rather than
procedural or declarative , in which the
program consists of a collection of statements.
 A statement can be thought of as an operation,
or a command.
 Fields - Field is a piece of data
[eg : student_id = 01]
 Tuples - Tuple is a ordered set of fields
[eg : ( 01, Raja,MCA, C++)]
 Bags - Bag collection of tuples
[eg : ( 01, Raja, MCA, C++),
eg: ( 22, Ramesh, MBA, C) ]
SIMPLE TYPE DESCRIPTION
int Signed 32-bit integer
long Signed 64-bit integer
float 32-bit floating point
double 64-bit floating point
chararray Character array (string) in Unicode UTF-8
format
bytearray Byte array (blob)
boolean boolean
Statement Description
Load Read data from the file system
Store Write data to the file system
Dump Generate output
Foreach Apply expression to each record
and generate one or more records
Filter Apply predicate to each record and
remove records where false
Group / Cogroup Collect records with the same key
from one or more inputs
Join Join two or more inputs based on a
key
Order Sort records based on a Key
Distinct Remove duplicate records
Union Merge two datasets
Limit Limit the number of records
Split Split data into 2 or more sets,
based on filter conditions
LOAD 'data' [USING function] [AS schema]; data
 Example:
data = load '$dir/age.csv' using PigStorage(',')
as (name:chararray, age:chararray)
 No action is taken until DUMP or STORE commands
are encountered
Pig will parse, validate and analyze statements but not
execute them
 DUMP – displays the results to the screen
 STORE – saves results (typically to a file)
FOREACH B GENERATE group, FUNCTION(A);
 Pig comes with many functions including COUNT,
FLATTEN, CONCAT, etc...
 Can implement a custom function
 Groups the data in one or multiple relations.
 The GROUP operator groups together tuples that
have the same group key (key field).
 The key field will be a tuple if the group key has
more than one field, otherwise it will be the same
type as that of the group key.
 COGROUP is the same as GROUP.
 Group two datasets together by a common attribute.
 Groups data into nested bags
“Use GROUP when only one relation is involved and
COGROUP with multiple relations re involved”
;
 Select a subset of the tuples in a bag
FILTER bag BY expression ;
 Expression uses simple comparison operators (==, !=,
<, >, …) and Logical connectors (AND, NOT, OR)
 Sorts a relation based on one or more fields.
alias = ORDER alias BY { * [ASC|DESC]}
 Joins two datasets together by a common attribute.
 By default JOIN operator always performs an inner
join.
 Inner joins ignore null keys, so it makes sense to
filter them out before the join.
 Records which will not join with the ‘other’
record-set are included using outer join
 Left Outer
Records from the first data-set are included whether
they have a match or not. Fields from the unmatched
(second) bag are set to null.
 User Defined Functions
 Is a way to operate on fields
 But not on group
 Can be called using the pig script
 Easy to use
 Easy to code
 Keeps the power of PIG
 You are free to write in
 Image Feature Extraction
 Geo Computations
 Data Cleaning
 Retrieve Web Pages
 NLP
………
 Even more…….
 Few bugs
 Few LOC
 Easier to read(purpose of analytics is straight
forward)
 Version match
 Pig is a data processing environment in
Hadoop which targets procedural
programmers, who do large-scale data
analysis.
 Pig-Latin offers high-level data manipulation
in a procedural style.
Pig

More Related Content

What's hot

Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R LanguageGaurang Dobariya
 
Practical pig
Practical pigPractical pig
Practical pigtrihug
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8Muhammad Nabi Ahmad
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsRupak Roy
 
Import Data using R
Import Data using R Import Data using R
Import Data using R Rupak Roy
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsAvjinder (Avi) Kaler
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using pythonPurna Chander
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into RKazuki Yoshida
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop toolsalireza alikhani
 
20120518 power shell_文字處理及輕量測試
20120518 power shell_文字處理及輕量測試20120518 power shell_文字處理及輕量測試
20120518 power shell_文字處理及輕量測試LearningTech
 

What's hot (20)

Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
 
Practical pig
Practical pigPractical pig
Practical pig
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
Reading Data into R
Reading Data into RReading Data into R
Reading Data into R
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
Import Data using R
Import Data using R Import Data using R
Import Data using R
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using python
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop tools
 
20120518 power shell_文字處理及輕量測試
20120518 power shell_文字處理及輕量測試20120518 power shell_文字處理及輕量測試
20120518 power shell_文字處理及輕量測試
 

Similar to Pig

Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...DrPDShebaKeziaMalarc
 
Pig - Analyzing data sets
Pig - Analyzing data setsPig - Analyzing data sets
Pig - Analyzing data setsCreditas
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsMaxim Grinev
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
SAS cheat sheet
SAS cheat sheetSAS cheat sheet
SAS cheat sheetAli Ajouz
 
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINAGetting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINAJISC GECO
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 

Similar to Pig (20)

Apache Pig
Apache PigApache Pig
Apache Pig
 
Pig statements
Pig statementsPig statements
Pig statements
 
4.1-Pig.pptx
4.1-Pig.pptx4.1-Pig.pptx
4.1-Pig.pptx
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
 
pig.ppt
pig.pptpig.ppt
pig.ppt
 
Pig - Analyzing data sets
Pig - Analyzing data setsPig - Analyzing data sets
Pig - Analyzing data sets
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Pig latin
Pig latinPig latin
Pig latin
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
SAS cheat sheet
SAS cheat sheetSAS cheat sheet
SAS cheat sheet
 
Getting started with PostGIS geographic database
Getting started with PostGIS geographic databaseGetting started with PostGIS geographic database
Getting started with PostGIS geographic database
 
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINAGetting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 

Recently uploaded

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 

Recently uploaded (20)

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 

Pig

  • 1. PIG C P Madumathi Sri Krishna college of arts and science
  • 2.
  • 3.  Called DATA FLOW LANGUAGE.  Used as a SCRIPTING LANGUAGE in Big Data Technology.  Executes through HDFS.  HDFS is based on GOOGLE FILE SYSTEM(GFS).
  • 4. IS A Can handle large dataset I LUV 2 EAT MORE N MORE
  • 5. PIG WORKS ON HADOOP ENVIRONMENT Even in a local mode
  • 6. • Not preferred for data analytics • 200 LOC = 10 LOC • Not rich in Built-in-functions
  • 7.
  • 9.  Pig system does two tasks: Logical Plan Physical Plan ◦ Builds a Logical Plan from a Pig Latin script ◦ Supports execution platform independence ◦ No processing of data performed at this stage Compiles the Logical Plan to a Physical Plan and Executes ◦ Convert the Logical Plan into a series of Map-Reduce statements to be executed by Hadoop Map-Reduce
  • 10. A = LOAD ‘dataset 1.dat’AS (name, dob, designation); B = GROUP A BY designation; C = FOREACH B GENERATE group AS dob, COUNT(A); D = FILTER C BY name IS ‘XXX’ OR name IS ‘yyy’; STORE D INTO ‘result.dat’; LOAD DATA
  • 11. A = LOAD ‘dataset 1.dat’AS (name, dob, designation); B = GROUP A BY designation; C = FOREACH B GENERATE group AS dob, COUNT(A); D = FILTER C BY name IS ‘XXX’ OR name IS ‘yyy’; STORE D INTO ‘result.dat LOAD DATA GROUP DATA
  • 12. A = LOAD ‘dataset 1.dat’AS (name, dob, designation); B = GROUP A BY designation; C = FOREACH B GENERATE group AS dob, COUNT(A); D = FILTER C BY name IS ‘XXX’ OR name IS ‘yyy’; STORE D INTO ‘result.dat’; LOAD DATA GROUP DATA FOREACH
  • 13. A = LOAD ‘dataset 1.dat’AS (name, dob, designation); B = GROUP A BY designation; C = FOREACH B GENERATE group AS dob, COUNT(A); D = FILTER C BY name IS ‘XXX’ OR name IS ‘yyy’; STORE D INTO ‘result.dat’; LOAD DATA GROUP DATA FOREACH FILTER
  • 14.  Linux above 10  Java above 6  Hadoop  Pig
  • 15.
  • 16.  Pig Latin is a data flow language rather than procedural or declarative , in which the program consists of a collection of statements.  A statement can be thought of as an operation, or a command.
  • 17.  Fields - Field is a piece of data [eg : student_id = 01]  Tuples - Tuple is a ordered set of fields [eg : ( 01, Raja,MCA, C++)]  Bags - Bag collection of tuples [eg : ( 01, Raja, MCA, C++), eg: ( 22, Ramesh, MBA, C) ]
  • 18. SIMPLE TYPE DESCRIPTION int Signed 32-bit integer long Signed 64-bit integer float 32-bit floating point double 64-bit floating point chararray Character array (string) in Unicode UTF-8 format bytearray Byte array (blob) boolean boolean
  • 19. Statement Description Load Read data from the file system Store Write data to the file system Dump Generate output Foreach Apply expression to each record and generate one or more records Filter Apply predicate to each record and remove records where false Group / Cogroup Collect records with the same key from one or more inputs Join Join two or more inputs based on a key
  • 20. Order Sort records based on a Key Distinct Remove duplicate records Union Merge two datasets Limit Limit the number of records Split Split data into 2 or more sets, based on filter conditions
  • 21. LOAD 'data' [USING function] [AS schema]; data  Example: data = load '$dir/age.csv' using PigStorage(',') as (name:chararray, age:chararray)
  • 22.  No action is taken until DUMP or STORE commands are encountered Pig will parse, validate and analyze statements but not execute them  DUMP – displays the results to the screen  STORE – saves results (typically to a file)
  • 23. FOREACH B GENERATE group, FUNCTION(A);  Pig comes with many functions including COUNT, FLATTEN, CONCAT, etc...  Can implement a custom function
  • 24.  Groups the data in one or multiple relations.  The GROUP operator groups together tuples that have the same group key (key field).  The key field will be a tuple if the group key has more than one field, otherwise it will be the same type as that of the group key.
  • 25.  COGROUP is the same as GROUP.  Group two datasets together by a common attribute.  Groups data into nested bags “Use GROUP when only one relation is involved and COGROUP with multiple relations re involved” ;
  • 26.  Select a subset of the tuples in a bag FILTER bag BY expression ;  Expression uses simple comparison operators (==, !=, <, >, …) and Logical connectors (AND, NOT, OR)
  • 27.  Sorts a relation based on one or more fields. alias = ORDER alias BY { * [ASC|DESC]}
  • 28.  Joins two datasets together by a common attribute.  By default JOIN operator always performs an inner join.  Inner joins ignore null keys, so it makes sense to filter them out before the join.
  • 29.  Records which will not join with the ‘other’ record-set are included using outer join  Left Outer Records from the first data-set are included whether they have a match or not. Fields from the unmatched (second) bag are set to null.
  • 30.
  • 31.  User Defined Functions  Is a way to operate on fields  But not on group  Can be called using the pig script
  • 32.  Easy to use  Easy to code  Keeps the power of PIG  You are free to write in
  • 33.
  • 34.  Image Feature Extraction  Geo Computations  Data Cleaning  Retrieve Web Pages  NLP ………  Even more…….
  • 35.  Few bugs  Few LOC  Easier to read(purpose of analytics is straight forward)
  • 37.
  • 38.  Pig is a data processing environment in Hadoop which targets procedural programmers, who do large-scale data analysis.  Pig-Latin offers high-level data manipulation in a procedural style.