SlideShare a Scribd company logo
PIG
Relational Operators - II
Foreach, Filter, Join, Co-
Group, Union
Relational operator: foreach
 foreach the name itself describes for each record do
something. It is similar to For-Loop for specifying the
iteration that is executed repeatedly.
 Example: select few columns
grunt> a =foreach dataTransaction Generate $0,$1,$2 ;
It can also be used for various arithmetic operations such as
grunt> A= FOREACH dataTransaction Generate $0,($3+$4)
as S;
or
grunt> a =foreach dataTransaction Generate $0,
(TransAmt1+TransaAmt2) as S;
Rupak Roy
grunt > B= FOREACH A GENERATE $1/100;
or
grunt> b = foreach A GENERATE ($1/100) as D
C= FOREACH B GENERATE ( (D >50)?’above’ :
‘below’);
or
C= foreach B generate ( (D==50)?’Equal’ :
((D>50)?’above’:’down’));
Rupak Roy
Relational Operators: filter
 It is used to select the required tuple based on conditions.
 Or simply we can say filter helps to remove unwanted data/records based
on requirements.
Example such as:
grunt> F = Filter dataTransaction by TransAmt1 > 500;
Or
grunt> F1 = filter dataTransaction by (($4+$5)/100) > 2 ;
Or
grunt> F2 = filter dataTransaction by $6 == ‘Nunavut’;
Or
grunt> F3 = filter data Transaction by $1 MATCHES ‘ Car.*’;
#it will give all the names that starts with CA….
Or
grunt> F4 = filter dataTRansaction by NOT $1 MATCHES ‘Car.*’;
#it will give all the names that doesnot starts with CA
Rupak Roy
Relational Operators: filter
Or
grunt>F5 = filter dataTransaction by CustomerName MATCHES ‘Ca.*s’;
#it will filter the records based on names starting with ‘Ca’ and ends with
‘s ’ . To represent any number of characters we use * and in this case we
want any number of characters before ‘s’but after Ca
Or
grunt> F5 = filter dataTransaction by CustomerName MATCHES
‘ .*(nica|los) .* ‘
#now here the dot start ( .* ) means it can have any number of characters
before and after .*(nica or los) .*
nica = MONICA Federle
los = Carlos Daly
Rupak Roy
Relational operators: Join
 Join Operator is used when we have to combine
two or more datasets.
 Joining the two or more datasets is done based
on a common key from the datasets.
 Joins can be of 3 types
1. Self-join
2. Inner-join
3. Outer-join – left join, right join and full join
Rupak Roy
Self – join
 Self join is used for joining a table itself.
Let’s understand this with the help of an example:
#Load the same dataset under different Alias name:
grunt> join1= LOAD ‘/home/hduser/datasets/join1.csv’
using PigStorage(‘,’) as ( CustomerNAme:chararray,
Transaction_ID:bytearray, ProductName: chararray);
grunt> join11= LOAD
‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’)
as ( CustomerNAme:chararray,
Transaction_ID:bytearray, ProductName: chararray);
Rupak Roy
#perform Self-join using JOIN operator
grunt> selfjoin = JOIN join1 by Transaction_ID, join11
by Transaction_ID;
grunt> dump selfjoin;
Rupak Roy
Inner-join
 Is also known as equijoin.
 Inner join returns rows when there is a match in both
tables based on a common key or a value.
#Load data2
grunt> join2= LOAD ‘/home/hduser/datasets/join2.csv’
using PigStorage(‘,’) as ( CustomerNAme:chararray,
Transaction_ID:bytearray, Department: chararray);
grunt> innerjoin = JOIN join1 by Transaction_ID, join2 by
Transaction_ID;
grunt> dump innerjoin;
Rupak Roy
Outer Join
 Left Outer Join returns all rows
from the left table, even if there is no
match in the right table and
it will take only the values from the right table that matches
with the left table.
grunt> leftouter = JOIN join1 by Transaction_ID LEFT OUTER, join2 BY Transaction_ID;
 Right Outer Join: is the opposite of Left Outer Join. It returns all
the rows from the right table even if there are no matches in
the left table and it will take only the values from the left table
that matches with the
right table
grunt> rightouter =JOIN join1 by Transaction_ID
RIGHT OUTER ,
join2 by Transaction_ID;
Rupak Roy
Outer Join
 Full Outer Join: returns all the rows from
both the tables when there is a match in
one of the relations.
grunt> fullouter = JOIN join1 by
Transaction_ID FULL OUTER, join2 BY
Transaction_ID;
Rupak Roy
Joins are one of the important operators
Rupak Roy
CO-Group: which essentially performs a join and
a group at the same time.
COGROUP on multiple datasets results in a record
with a key dataset.
To perform COGROUP type:
grunt> COGROUP join1 on Transaction_ID, join2 on
Transaction_ID;
Rupak Roy
Relational Operator: UNION
 Is to merge the contents of two and more datasets.
grunt> U = UNION join1, join2;
dump U;
What if we want to merge two datasets that has different schemas exampe:
join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray);
join1u= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:int, Department: chararray);
join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray);
Unioned= UNION join1u,join2 ;
Describe Unioned; it will through an error ‘cannot cast to byte array ‘ due to different data
types of transaction ID.
Rupak Roy
 It will be very tedious and time consuming to go
back and forth and load the data to change the
schema. We can also explicitly define the schema
while using relational queries without disturbing the
original schema.
grunt> joinM= FOREACH join2 generate $0,(int)$1,$2;
unioned = UNION joinM, join1u;
describe unioned;
Alternatively to perform UNION for incompatible
data type using ONSCHEMA;
grunt>U= UNION ONSCHEMA join1u, join2;
Rupak Roy
Relational Operator: RANK
 Returns rank to each tuple with a relation;
Example:
grunt> vi names
Zara,1,F
David,2,F
David,2,T
Alan,2,M
Calvin,3,M
Alan,5,M
Chris,8,M
Ellie ,7,F
Bob,8,M
Carlos,2,M
Then press ‘ ESC’ key then type ‘ :wq! ‘ to save
grunt> names = load ‘/home/hduser/datasets/names’ using PigStorage (‘,’) as
( n1:charrray,n2:int,n3:chararray);
grunt> DUMP names;
Rupak Roy
grunt> ranked = RANK names;
grunt> dump ranked;
(1, Zara,1,F)
(2, David,2,F)
(3 David,2,T)
(4 Alan,2,M)
(5, Calvin,2,M)
(6, Alan,5,M)
(7, Chris,8,M)
(8, Ellie ,7,F)
(9, Bob,8,M)
(10,Carlos,2,F)
We can also implement rank using two fields, each one with
different sorting order.
grunt> ranked2 = RANK names by N1 ASC, N2 DESC;
grunt> dump ranked2;
Rupak Roy
 Sometimes we might encounter the RANK has been
assigned to 2 fields or 2 records with a same rank.
 To overcome the issue we have a small function call
DENSE
grunt> rankedG = RANK names by N1 DESC, N2 ASC DENSE;
(1,Zara,1,F)
(2,Elie,7,F)
(3,David,2,F)
(3,David,2,T)
(4,Chris,8,M)
(5,Carlos,2,F)
(6,Calvin,3,M)
(7,bob,8,M)
(8,Alan,2,M)
(9,Alan,5,M)
Rupak Roy
Next
 We will learn UDF (User Define Function).
Rupak Roy

More Related Content

What's hot

Python list
Python listPython list
Python list
Mohammed Sikander
 
[1062BPY12001] Data analysis with R / April 26
[1062BPY12001] Data analysis with R / April 26[1062BPY12001] Data analysis with R / April 26
[1062BPY12001] Data analysis with R / April 26
Kevin Chun-Hsien Hsu
 
Chapter 2 grouping,scalar and aggergate functions,joins inner join,outer join
Chapter 2  grouping,scalar and aggergate functions,joins   inner join,outer joinChapter 2  grouping,scalar and aggergate functions,joins   inner join,outer join
Chapter 2 grouping,scalar and aggergate functions,joins inner join,outer join
baabtra.com - No. 1 supplier of quality freshers
 
SQL Functions and Operators
SQL Functions and OperatorsSQL Functions and Operators
SQL Functions and Operators
Mohan Kumar.R
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
Rsquared Academy
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
Abhik Seal
 
Python Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, DictionaryPython Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, Dictionary
Soba Arjun
 
R factors
R   factorsR   factors
Read data from Excel spreadsheets into R
Read data from Excel spreadsheets into RRead data from Excel spreadsheets into R
Read data from Excel spreadsheets into R
Rsquared Academy
 
Python set
Python setPython set
Python set
Mohammed Sikander
 
Python Workshop Part 2. LUG Maniapl
Python Workshop Part 2. LUG ManiaplPython Workshop Part 2. LUG Maniapl
Python Workshop Part 2. LUG Maniapl
Ankur Shrivastava
 
List in Python
List in PythonList in Python
List in Python
Siddique Ibrahim
 
Technical
TechnicalTechnical
Technical
ved prakash
 
Data handling in r
Data handling in rData handling in r
Data handling in r
Abhik Seal
 
New features in Ruby 2.4
New features in Ruby 2.4New features in Ruby 2.4
New features in Ruby 2.4
Ireneusz Skrobiś
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Data type list_methods_in_python
Data type list_methods_in_pythonData type list_methods_in_python
Data type list_methods_in_python
deepalishinkar1
 
ABAP 7.x New Features and Commands
ABAP 7.x New Features and CommandsABAP 7.x New Features and Commands
ABAP 7.x New Features and Commands
Dr. Kerem Koseoglu
 

What's hot (19)

Python list
Python listPython list
Python list
 
[1062BPY12001] Data analysis with R / April 26
[1062BPY12001] Data analysis with R / April 26[1062BPY12001] Data analysis with R / April 26
[1062BPY12001] Data analysis with R / April 26
 
Sets in python
Sets in pythonSets in python
Sets in python
 
Chapter 2 grouping,scalar and aggergate functions,joins inner join,outer join
Chapter 2  grouping,scalar and aggergate functions,joins   inner join,outer joinChapter 2  grouping,scalar and aggergate functions,joins   inner join,outer join
Chapter 2 grouping,scalar and aggergate functions,joins inner join,outer join
 
SQL Functions and Operators
SQL Functions and OperatorsSQL Functions and Operators
SQL Functions and Operators
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Python Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, DictionaryPython Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, Dictionary
 
R factors
R   factorsR   factors
R factors
 
Read data from Excel spreadsheets into R
Read data from Excel spreadsheets into RRead data from Excel spreadsheets into R
Read data from Excel spreadsheets into R
 
Python set
Python setPython set
Python set
 
Python Workshop Part 2. LUG Maniapl
Python Workshop Part 2. LUG ManiaplPython Workshop Part 2. LUG Maniapl
Python Workshop Part 2. LUG Maniapl
 
List in Python
List in PythonList in Python
List in Python
 
Technical
TechnicalTechnical
Technical
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
New features in Ruby 2.4
New features in Ruby 2.4New features in Ruby 2.4
New features in Ruby 2.4
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Data type list_methods_in_python
Data type list_methods_in_pythonData type list_methods_in_python
Data type list_methods_in_python
 
ABAP 7.x New Features and Commands
ABAP 7.x New Features and CommandsABAP 7.x New Features and Commands
ABAP 7.x New Features and Commands
 

Similar to Apache Pig Relational Operators - II

RELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docx
RELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docxRELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docx
RELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docx
sodhi3
 
Php Chapter 1 Training
Php Chapter 1 TrainingPhp Chapter 1 Training
Php Chapter 1 Training
Chris Chubb
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
Chris Wilkes
 
Apache pig
Apache pigApache pig
Apache pig
Jigar Parekh
 
PLSQL Note
PLSQL NotePLSQL Note
PLSQL Note
Arun Sial
 
DBIx-DataModel v2.0 in detail
DBIx-DataModel v2.0 in detail DBIx-DataModel v2.0 in detail
DBIx-DataModel v2.0 in detail
Laurent Dami
 
perl usage at database applications
perl usage at database applicationsperl usage at database applications
perl usage at database applicationsJoe Jiang
 
Mysqlppt
MysqlpptMysqlppt
MysqlpptReka
 
Sql
SqlSql
Php-Continuation
Php-ContinuationPhp-Continuation
Php-Continuationlotlot
 
DOODB_LAB.pptx
DOODB_LAB.pptxDOODB_LAB.pptx
DOODB_LAB.pptx
FilestreamFilestream
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQL
Vu Hung Nguyen
 
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docxIMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
wilcockiris
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
AnandMHadoop
 
GPars For Beginners
GPars For BeginnersGPars For Beginners
GPars For Beginners
Matt Passell
 
Python_Unit-1_PPT_Data Types.pptx
Python_Unit-1_PPT_Data Types.pptxPython_Unit-1_PPT_Data Types.pptx
Python_Unit-1_PPT_Data Types.pptx
SahajShrimal1
 
Perl
PerlPerl
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
tdc-globalcode
 

Similar to Apache Pig Relational Operators - II (20)

RELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docx
RELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docxRELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docx
RELATIONAL DATABASES & Database designCIS276EmployeeNumFir.docx
 
SQL -PHP Tutorial
SQL -PHP TutorialSQL -PHP Tutorial
SQL -PHP Tutorial
 
Php Chapter 1 Training
Php Chapter 1 TrainingPhp Chapter 1 Training
Php Chapter 1 Training
 
Spufi
SpufiSpufi
Spufi
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
Apache pig
Apache pigApache pig
Apache pig
 
PLSQL Note
PLSQL NotePLSQL Note
PLSQL Note
 
DBIx-DataModel v2.0 in detail
DBIx-DataModel v2.0 in detail DBIx-DataModel v2.0 in detail
DBIx-DataModel v2.0 in detail
 
perl usage at database applications
perl usage at database applicationsperl usage at database applications
perl usage at database applications
 
Mysqlppt
MysqlpptMysqlppt
Mysqlppt
 
Sql
SqlSql
Sql
 
Php-Continuation
Php-ContinuationPhp-Continuation
Php-Continuation
 
DOODB_LAB.pptx
DOODB_LAB.pptxDOODB_LAB.pptx
DOODB_LAB.pptx
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQL
 
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docxIMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
 
GPars For Beginners
GPars For BeginnersGPars For Beginners
GPars For Beginners
 
Python_Unit-1_PPT_Data Types.pptx
Python_Unit-1_PPT_Data Types.pptxPython_Unit-1_PPT_Data Types.pptx
Python_Unit-1_PPT_Data Types.pptx
 
Perl
PerlPerl
Perl
 
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
Rupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
Rupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
Rupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
Rupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
Rupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
Rupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
Rupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
Rupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
Rupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
Rupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
Rupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
Rupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
Rupak Roy
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 

Recently uploaded

Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 

Apache Pig Relational Operators - II

  • 1. PIG Relational Operators - II Foreach, Filter, Join, Co- Group, Union
  • 2. Relational operator: foreach  foreach the name itself describes for each record do something. It is similar to For-Loop for specifying the iteration that is executed repeatedly.  Example: select few columns grunt> a =foreach dataTransaction Generate $0,$1,$2 ; It can also be used for various arithmetic operations such as grunt> A= FOREACH dataTransaction Generate $0,($3+$4) as S; or grunt> a =foreach dataTransaction Generate $0, (TransAmt1+TransaAmt2) as S; Rupak Roy
  • 3. grunt > B= FOREACH A GENERATE $1/100; or grunt> b = foreach A GENERATE ($1/100) as D C= FOREACH B GENERATE ( (D >50)?’above’ : ‘below’); or C= foreach B generate ( (D==50)?’Equal’ : ((D>50)?’above’:’down’)); Rupak Roy
  • 4. Relational Operators: filter  It is used to select the required tuple based on conditions.  Or simply we can say filter helps to remove unwanted data/records based on requirements. Example such as: grunt> F = Filter dataTransaction by TransAmt1 > 500; Or grunt> F1 = filter dataTransaction by (($4+$5)/100) > 2 ; Or grunt> F2 = filter dataTransaction by $6 == ‘Nunavut’; Or grunt> F3 = filter data Transaction by $1 MATCHES ‘ Car.*’; #it will give all the names that starts with CA…. Or grunt> F4 = filter dataTRansaction by NOT $1 MATCHES ‘Car.*’; #it will give all the names that doesnot starts with CA Rupak Roy
  • 5. Relational Operators: filter Or grunt>F5 = filter dataTransaction by CustomerName MATCHES ‘Ca.*s’; #it will filter the records based on names starting with ‘Ca’ and ends with ‘s ’ . To represent any number of characters we use * and in this case we want any number of characters before ‘s’but after Ca Or grunt> F5 = filter dataTransaction by CustomerName MATCHES ‘ .*(nica|los) .* ‘ #now here the dot start ( .* ) means it can have any number of characters before and after .*(nica or los) .* nica = MONICA Federle los = Carlos Daly Rupak Roy
  • 6. Relational operators: Join  Join Operator is used when we have to combine two or more datasets.  Joining the two or more datasets is done based on a common key from the datasets.  Joins can be of 3 types 1. Self-join 2. Inner-join 3. Outer-join – left join, right join and full join Rupak Roy
  • 7. Self – join  Self join is used for joining a table itself. Let’s understand this with the help of an example: #Load the same dataset under different Alias name: grunt> join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:bytearray, ProductName: chararray); grunt> join11= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:bytearray, ProductName: chararray); Rupak Roy
  • 8. #perform Self-join using JOIN operator grunt> selfjoin = JOIN join1 by Transaction_ID, join11 by Transaction_ID; grunt> dump selfjoin; Rupak Roy
  • 9. Inner-join  Is also known as equijoin.  Inner join returns rows when there is a match in both tables based on a common key or a value. #Load data2 grunt> join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:bytearray, Department: chararray); grunt> innerjoin = JOIN join1 by Transaction_ID, join2 by Transaction_ID; grunt> dump innerjoin; Rupak Roy
  • 10. Outer Join  Left Outer Join returns all rows from the left table, even if there is no match in the right table and it will take only the values from the right table that matches with the left table. grunt> leftouter = JOIN join1 by Transaction_ID LEFT OUTER, join2 BY Transaction_ID;  Right Outer Join: is the opposite of Left Outer Join. It returns all the rows from the right table even if there are no matches in the left table and it will take only the values from the left table that matches with the right table grunt> rightouter =JOIN join1 by Transaction_ID RIGHT OUTER , join2 by Transaction_ID; Rupak Roy
  • 11. Outer Join  Full Outer Join: returns all the rows from both the tables when there is a match in one of the relations. grunt> fullouter = JOIN join1 by Transaction_ID FULL OUTER, join2 BY Transaction_ID; Rupak Roy
  • 12. Joins are one of the important operators Rupak Roy
  • 13. CO-Group: which essentially performs a join and a group at the same time. COGROUP on multiple datasets results in a record with a key dataset. To perform COGROUP type: grunt> COGROUP join1 on Transaction_ID, join2 on Transaction_ID; Rupak Roy
  • 14. Relational Operator: UNION  Is to merge the contents of two and more datasets. grunt> U = UNION join1, join2; dump U; What if we want to merge two datasets that has different schemas exampe: join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray); join1u= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:int, Department: chararray); join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray); Unioned= UNION join1u,join2 ; Describe Unioned; it will through an error ‘cannot cast to byte array ‘ due to different data types of transaction ID. Rupak Roy
  • 15.  It will be very tedious and time consuming to go back and forth and load the data to change the schema. We can also explicitly define the schema while using relational queries without disturbing the original schema. grunt> joinM= FOREACH join2 generate $0,(int)$1,$2; unioned = UNION joinM, join1u; describe unioned; Alternatively to perform UNION for incompatible data type using ONSCHEMA; grunt>U= UNION ONSCHEMA join1u, join2; Rupak Roy
  • 16. Relational Operator: RANK  Returns rank to each tuple with a relation; Example: grunt> vi names Zara,1,F David,2,F David,2,T Alan,2,M Calvin,3,M Alan,5,M Chris,8,M Ellie ,7,F Bob,8,M Carlos,2,M Then press ‘ ESC’ key then type ‘ :wq! ‘ to save grunt> names = load ‘/home/hduser/datasets/names’ using PigStorage (‘,’) as ( n1:charrray,n2:int,n3:chararray); grunt> DUMP names; Rupak Roy
  • 17. grunt> ranked = RANK names; grunt> dump ranked; (1, Zara,1,F) (2, David,2,F) (3 David,2,T) (4 Alan,2,M) (5, Calvin,2,M) (6, Alan,5,M) (7, Chris,8,M) (8, Ellie ,7,F) (9, Bob,8,M) (10,Carlos,2,F) We can also implement rank using two fields, each one with different sorting order. grunt> ranked2 = RANK names by N1 ASC, N2 DESC; grunt> dump ranked2; Rupak Roy
  • 18.  Sometimes we might encounter the RANK has been assigned to 2 fields or 2 records with a same rank.  To overcome the issue we have a small function call DENSE grunt> rankedG = RANK names by N1 DESC, N2 ASC DENSE; (1,Zara,1,F) (2,Elie,7,F) (3,David,2,F) (3,David,2,T) (4,Chris,8,M) (5,Carlos,2,F) (6,Calvin,3,M) (7,bob,8,M) (8,Alan,2,M) (9,Alan,5,M) Rupak Roy
  • 19. Next  We will learn UDF (User Define Function). Rupak Roy