SlideShare a Scribd company logo
1 of 18
Download to read offline
Hadoop Developer
Training
Session 04 - PIG
Page 2Classification: Restricted
Agenda
PIG
• Loads in Pig Continued
• Verification
• Filters
• Macros in Pig
Page 3Classification: Restricted
Load in Pig
Inner Join is used quite frequently; it is also referred to as equijoin. An inner
join returns rows when there is a match in both tables.
It creates a new relation by combining column values of two relations (say A
and B) based upon the join-predicate. The query compares each row of A
with each row of B to find all pairs of rows which satisfy the join-predicate.
When the join-predicate is satisfied, the column values for each matched pair
of rows of A and B are combined into a result row.
Syntax
Here is the syntax of performing inner join operation using the JOIN operator.
grunt> result = JOIN relation1 BY columnname, relation2 BY columnname;
Example
Let us perform inner join operation on the two relations customers and
orders as shown below.
grunt> coustomer_orders = JOIN customers BY id, orders BY customer_id
Page 4Classification: Restricted
Verification
Verify the relation coustomer_orders using the DUMP operator as shown
below.
grunt> Dump coustomer_orders; Output
You will get the following output that will the contents of the relation
named coustomer_orders.
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
Outer Join: Unlike inner join, outer join returns all the rows from at least one
of the relations. An outer join operation is carried out in three ways −
Left outer join
Right outer join
Full outer join
Left Outer Join
Page 5Classification: Restricted
Verification
The left outer Join operation returns all rows from the left table, even if
there are no matches in the right relation.
Syntax
Given below is the syntax of performing left outer join operation using the
JOIN operator.
grunt> Relation3_name = JOIN Relation1_name BY id LEFT OUTER,
Relation2_name BY customer_id;
Example
Let us perform left outer join operation on the two relations customers and
orders as shown below.
grunt> outer_left = JOIN customers BY id LEFT OUTER, orders BY
customer_id;
Page 6Classification: Restricted
Verification
Verify the relation outer_left using the DUMP operator as shown below.
grunt> Dump outer_left; Output
It will produce the following output, displaying the contents of the relation
outer_left.
(1,Peter,32,Salt Lake City,2000,,,,)
(2,Aaron,25,Salt Lake City,1500,101,2009-11-20 00:00:00,2,1560)
(3,Danny,23,Salt Lake City,2000,100,2009-10-08 00:00:00,3,1500)
(3,Danny,23,Salt Lake City,2000,102,2009-10-08 00:00:00,3,3000)
(4,Angela,25,Salt Lake City,6500,103,2008-05-20 00:00:00,4,2060)
(5,Peggy,27,Bhopal,8500,,,,)
(6,King,22,MP,4500,,,,)
(7,Carolyn,24,Indore,10000,,,,)
Page 7Classification: Restricted
Verification
Right Outer Join
The right outer join operation returns all rows from the right table, even if
there are no matches in the left table.
Syntax
Given below is the syntax of performing right outer join operation using the
JOIN operator.
grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;
Example
Let us perform right outer join operation on the two relations customers and
orders as shown below.
grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;
Verification
Verify the relation outer_right using the DUMP operator as shown below.
grunt> Dump outer_right Output
Page 8Classification: Restricted
Verification
It will produce the following output, displaying the contents of the relation
outer_right.
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060
The SPLIT operator is used to split a relation into two or more relations.
Syntax
Given below is the syntax of the SPLIT operator.
SPLIT student_details into student_details1 if age<23, student_details2 if
(22<age and age>25);
Dump student_details1;
grunt> Dump student_details2;
Page 9Classification: Restricted
Verification
Output
It will produce the following output, displaying the contents of the relations
student_details1 and student_details2 respectively.
grunt> Dump student_details1;
(1, Peter, Burke, 4353521729, Salt Lake City)
(2, Aaron, Kimberlake, 8013528191, Salt Lake City)
(3, Danny, Jacob, 2958295582, Salt Lake City)
(4, Angela, Kouth, 2938811911, Salt Lake City)
grunt> Dump student_details2;
(5, Peggy, Karter, 3202289119, Salt Lake City)
(6, King, Salmon, 2398329282, Salt Lake City)
(7, Carolyn, Fisher, 2293322829, Salt Lake City)
(8, John, Hopkins, 2102392020, Salt Lake City)
Page 10Classification: Restricted
Verification
The FILTER operator is used to select the required tuples from a relation
based on a condition.
Syntax
Given below is the syntax of the FILTER operator.
grunt> Relation2_name = FILTER Relation1_name BY (condition);
Example
Assume that we have a file named student_details.txt in the HDFS directory
/pig_data/ as shown below.
student_details.txt
1, Peter, Burke, 4353521729, Salt Lake City
2, Aaron, Kimberlake, 8013528191, Salt Lake City
3, Danny, Jacob, 2958295582, Salt Lake City
4, Angela, Kouth, 2938811911, Salt Lake City
Page 11Classification: Restricted
Verification
5, Peggy, Karter, 3202289119, Salt Lake City
6, King, Salmon, 2398329282, Salt Lake City
7, Carolyn, Fisher, 2293322829, Salt Lake City
8, John, Hopkins, 2102392020, Salt Lake City
And we have loaded this file into Pig with the relation name student_details
as shown below.
grunt> student_details = LOAD '/pig_data/student_details.txt' USING
PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int,
phone:chararray, city:chararray);
cx = FILTER student_details BY city == 'Chennai’;
Verification
Verify the relation filter_data using the DUMP operator as shown below.
grunt> Dump filter_data;
Page 12Classification: Restricted
Verification
Output
It will produce the following output, displaying the contents of the relation
filter_data as follows.
(6, King, Salmon, 2398329282, Salt Lake City)
(8, John, Hopkins, 2102392020, Salt Lake City)
The DISTINCT operator is used to remove redundant (duplicate) tuples from
a relation.
distinct_data = DISTINCT student_details;
grunt> Dump distinct_data;
The FOREACH operator is used to generate specified data transformations
based on the column data
grunt> foreach_data = FOREACH student_details GENERATE id,age,city;
grunt> Dump foreach_data; Output
It will produce the following output, displaying the contents of the relation
foreach_data.
Page 13Classification: Restricted
Verification
(1,21,Salt Lake City)
(2,22, Salt Lake City)
(3,22, Salt Lake City)
(4,21, Salt Lake City)
(5,23, Salt Lake City)
(6,23, Salt Lake City)
(7,24, Salt Lake City)
(8,24, Salt Lake City)
Assert operator is used for data validation. The script will fail if it doesn't
meets the specified condition in assert
paste the data on desktop as
12,23
23,34
-21,22
Page 14Classification: Restricted
Verification
a = load '/home/mishra/Desktop/exp' USING PigStorage(',') AS (id:int,roll:int);
now apply the assert operator
grunt> assert a by id >0,'a cant be neg'
dump a;
an error is generated as one of the values in id is negative
check the details at the generated log file of the pig..at the end of the file you
will find
Assertion violated: a cant be neg
now assume another example witth same data
grunt> b = load '/home/mishra/Desktop/exp' USING PigStorage(',') AS
(id:int,roll:int);
grunt> assert b by id > 13,'value is below 13';
Dump b;
an error is generated as few values in id is less than 13
check the details at the generated log file of the pig..at the end of the file you
will find
Assertion violated: value is below 13
Page 15Classification: Restricted
Macros in Pig
We can develop more reusable scripts in Pig Latin Using Macros also.Macro is
a kind of function written in Pig Latin.
DEFINE keyword is used to make macros
You can define a function by writing a macro and then reuse that macro
paste the undergiven data of student with fields as id,name,fees,rollno
respectively:::::::::::::
10,Peter,10000,1
15,Aaron,20000,25
30,Danny,30000,1
40,Angela,40000,35
move the data to hdfs by:::::
hadoop fs -put /home/ands/Desktop/xyz.txt /pigy
make a file on Desktop with .pig extention(say macro.pig) and paste the lines
below:::::::::::::::
DEFINE myfilter(relvar,colvar) returns x{
$x = filter $relvar by $colvar==1;
};
Page 16Classification: Restricted
Macros in Pig
stu = load '/pigy'using PigStorage(',') as (id,name,fees,rollno);
studrollno1 =myfilter( stu,rollno);
dump studrollno1;
Above macro takes two values as input,one is relation variable (relvar) and
second is column variable (colvar)
macro checks if colvar equals to 1 or not
now change directory to desktop and run:::::
pig -f macro.pig
Page 17Classification: Restricted
Topics to be covered in next session
• Sqoop
• Sqoop Installation
• Exporting the data
• Exporting from Hadoop to SQL
Page 18Classification: Restricted
Thank you!

More Related Content

What's hot

Import and Export Big Data using R Studio
Import and Export Big Data using R StudioImport and Export Big Data using R Studio
Import and Export Big Data using R StudioRupak Roy
 
Manipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R StudioManipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R StudioRupak Roy
 
Merge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RMerge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RYogesh Khandelwal
 
Data preparation, depth function
Data preparation, depth functionData preparation, depth function
Data preparation, depth functionFAO
 
7. Data Import – Data Export
7. Data Import – Data Export7. Data Import – Data Export
7. Data Import – Data ExportFAO
 
Unit 3 writable collections
Unit 3 writable collectionsUnit 3 writable collections
Unit 3 writable collectionsvishal choudhary
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environmentYogendra Chaubey
 

What's hot (20)

Import and Export Big Data using R Studio
Import and Export Big Data using R StudioImport and Export Big Data using R Studio
Import and Export Big Data using R Studio
 
Big Data Analytics Lab File
Big Data Analytics Lab FileBig Data Analytics Lab File
Big Data Analytics Lab File
 
Apache pig
Apache pigApache pig
Apache pig
 
Manipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R StudioManipulating Data using DPLYR in R Studio
Manipulating Data using DPLYR in R Studio
 
Unit 2
Unit 2Unit 2
Unit 2
 
Merge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RMerge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using R
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Data preparation, depth function
Data preparation, depth functionData preparation, depth function
Data preparation, depth function
 
7. Data Import – Data Export
7. Data Import – Data Export7. Data Import – Data Export
7. Data Import – Data Export
 
Unit 3 writable collections
Unit 3 writable collectionsUnit 3 writable collections
Unit 3 writable collections
 
Unit 3
Unit 3Unit 3
Unit 3
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
Unit 2 part-2
Unit 2 part-2Unit 2 part-2
Unit 2 part-2
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 

Similar to Session 04 -Pig Continued

IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptxRacksaviR
 
Database Management System Review
Database Management System ReviewDatabase Management System Review
Database Management System ReviewKaya Ota
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database DesignPrabu U
 
Introduction to javascript.ppt
Introduction to javascript.pptIntroduction to javascript.ppt
Introduction to javascript.pptBArulmozhi
 
Database Connectivity with JDBC
Database Connectivity with JDBCDatabase Connectivity with JDBC
Database Connectivity with JDBCDudy Ali
 
Optimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansOptimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansDatabricks
 
Building an Autonomous Data Layer
Building an Autonomous Data LayerBuilding an Autonomous Data Layer
Building an Autonomous Data LayerMartyPitt1
 
Class 12 computer sample paper with answers
Class 12 computer sample paper with answersClass 12 computer sample paper with answers
Class 12 computer sample paper with answersdebarghyamukherjee60
 
mysql 高级优化之 理解索引使用
mysql 高级优化之 理解索引使用mysql 高级优化之 理解索引使用
mysql 高级优化之 理解索引使用nigel889
 
Inspec one tool to rule them all
Inspec one tool to rule them allInspec one tool to rule them all
Inspec one tool to rule them allKimball Johnson
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Data mining final report
Data mining final reportData mining final report
Data mining final reportKedar Kumar
 
Tutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online DatabaseTutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online DatabaseDBrow Adm
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
 
6_SQL.pdf
6_SQL.pdf6_SQL.pdf
6_SQL.pdfLPhct2
 

Similar to Session 04 -Pig Continued (20)

IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptx
 
ASP.NET 09 - ADO.NET
ASP.NET 09 - ADO.NETASP.NET 09 - ADO.NET
ASP.NET 09 - ADO.NET
 
Assignment#04
Assignment#04Assignment#04
Assignment#04
 
Database Management System Review
Database Management System ReviewDatabase Management System Review
Database Management System Review
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Databases with SQLite3.pdf
Databases with SQLite3.pdfDatabases with SQLite3.pdf
Databases with SQLite3.pdf
 
Introduction to javascript.ppt
Introduction to javascript.pptIntroduction to javascript.ppt
Introduction to javascript.ppt
 
Database Connectivity with JDBC
Database Connectivity with JDBCDatabase Connectivity with JDBC
Database Connectivity with JDBC
 
Optimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansOptimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex Plans
 
Building an Autonomous Data Layer
Building an Autonomous Data LayerBuilding an Autonomous Data Layer
Building an Autonomous Data Layer
 
Iowa_Report_2
Iowa_Report_2Iowa_Report_2
Iowa_Report_2
 
Class 12 computer sample paper with answers
Class 12 computer sample paper with answersClass 12 computer sample paper with answers
Class 12 computer sample paper with answers
 
mysql 高级优化之 理解索引使用
mysql 高级优化之 理解索引使用mysql 高级优化之 理解索引使用
mysql 高级优化之 理解索引使用
 
Inspec one tool to rule them all
Inspec one tool to rule them allInspec one tool to rule them all
Inspec one tool to rule them all
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Data mining final report
Data mining final reportData mining final report
Data mining final report
 
Tutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online DatabaseTutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online Database
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
6_SQL.pdf
6_SQL.pdf6_SQL.pdf
6_SQL.pdf
 

More from AnandMHadoop

Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - FlumeAnandMHadoop
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn ConceptsAnandMHadoop
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 

More from AnandMHadoop (7)

Overview of Java
Overview of Java Overview of Java
Overview of Java
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Session 02 - Yarn Concepts
Session 02 - Yarn ConceptsSession 02 - Yarn Concepts
Session 02 - Yarn Concepts
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 

Recently uploaded

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Recently uploaded (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Session 04 -Pig Continued

  • 2. Page 2Classification: Restricted Agenda PIG • Loads in Pig Continued • Verification • Filters • Macros in Pig
  • 3. Page 3Classification: Restricted Load in Pig Inner Join is used quite frequently; it is also referred to as equijoin. An inner join returns rows when there is a match in both tables. It creates a new relation by combining column values of two relations (say A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied, the column values for each matched pair of rows of A and B are combined into a result row. Syntax Here is the syntax of performing inner join operation using the JOIN operator. grunt> result = JOIN relation1 BY columnname, relation2 BY columnname; Example Let us perform inner join operation on the two relations customers and orders as shown below. grunt> coustomer_orders = JOIN customers BY id, orders BY customer_id
  • 4. Page 4Classification: Restricted Verification Verify the relation coustomer_orders using the DUMP operator as shown below. grunt> Dump coustomer_orders; Output You will get the following output that will the contents of the relation named coustomer_orders. (2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560) (3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500) (3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000) (4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060) Outer Join: Unlike inner join, outer join returns all the rows from at least one of the relations. An outer join operation is carried out in three ways − Left outer join Right outer join Full outer join Left Outer Join
  • 5. Page 5Classification: Restricted Verification The left outer Join operation returns all rows from the left table, even if there are no matches in the right relation. Syntax Given below is the syntax of performing left outer join operation using the JOIN operator. grunt> Relation3_name = JOIN Relation1_name BY id LEFT OUTER, Relation2_name BY customer_id; Example Let us perform left outer join operation on the two relations customers and orders as shown below. grunt> outer_left = JOIN customers BY id LEFT OUTER, orders BY customer_id;
  • 6. Page 6Classification: Restricted Verification Verify the relation outer_left using the DUMP operator as shown below. grunt> Dump outer_left; Output It will produce the following output, displaying the contents of the relation outer_left. (1,Peter,32,Salt Lake City,2000,,,,) (2,Aaron,25,Salt Lake City,1500,101,2009-11-20 00:00:00,2,1560) (3,Danny,23,Salt Lake City,2000,100,2009-10-08 00:00:00,3,1500) (3,Danny,23,Salt Lake City,2000,102,2009-10-08 00:00:00,3,3000) (4,Angela,25,Salt Lake City,6500,103,2008-05-20 00:00:00,4,2060) (5,Peggy,27,Bhopal,8500,,,,) (6,King,22,MP,4500,,,,) (7,Carolyn,24,Indore,10000,,,,)
  • 7. Page 7Classification: Restricted Verification Right Outer Join The right outer join operation returns all rows from the right table, even if there are no matches in the left table. Syntax Given below is the syntax of performing right outer join operation using the JOIN operator. grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id; Example Let us perform right outer join operation on the two relations customers and orders as shown below. grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id; Verification Verify the relation outer_right using the DUMP operator as shown below. grunt> Dump outer_right Output
  • 8. Page 8Classification: Restricted Verification It will produce the following output, displaying the contents of the relation outer_right. (2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560) (3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500) (3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000) (4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060 The SPLIT operator is used to split a relation into two or more relations. Syntax Given below is the syntax of the SPLIT operator. SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25); Dump student_details1; grunt> Dump student_details2;
  • 9. Page 9Classification: Restricted Verification Output It will produce the following output, displaying the contents of the relations student_details1 and student_details2 respectively. grunt> Dump student_details1; (1, Peter, Burke, 4353521729, Salt Lake City) (2, Aaron, Kimberlake, 8013528191, Salt Lake City) (3, Danny, Jacob, 2958295582, Salt Lake City) (4, Angela, Kouth, 2938811911, Salt Lake City) grunt> Dump student_details2; (5, Peggy, Karter, 3202289119, Salt Lake City) (6, King, Salmon, 2398329282, Salt Lake City) (7, Carolyn, Fisher, 2293322829, Salt Lake City) (8, John, Hopkins, 2102392020, Salt Lake City)
  • 10. Page 10Classification: Restricted Verification The FILTER operator is used to select the required tuples from a relation based on a condition. Syntax Given below is the syntax of the FILTER operator. grunt> Relation2_name = FILTER Relation1_name BY (condition); Example Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt 1, Peter, Burke, 4353521729, Salt Lake City 2, Aaron, Kimberlake, 8013528191, Salt Lake City 3, Danny, Jacob, 2958295582, Salt Lake City 4, Angela, Kouth, 2938811911, Salt Lake City
  • 11. Page 11Classification: Restricted Verification 5, Peggy, Karter, 3202289119, Salt Lake City 6, King, Salmon, 2398329282, Salt Lake City 7, Carolyn, Fisher, 2293322829, Salt Lake City 8, John, Hopkins, 2102392020, Salt Lake City And we have loaded this file into Pig with the relation name student_details as shown below. grunt> student_details = LOAD '/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); cx = FILTER student_details BY city == 'Chennai’; Verification Verify the relation filter_data using the DUMP operator as shown below. grunt> Dump filter_data;
  • 12. Page 12Classification: Restricted Verification Output It will produce the following output, displaying the contents of the relation filter_data as follows. (6, King, Salmon, 2398329282, Salt Lake City) (8, John, Hopkins, 2102392020, Salt Lake City) The DISTINCT operator is used to remove redundant (duplicate) tuples from a relation. distinct_data = DISTINCT student_details; grunt> Dump distinct_data; The FOREACH operator is used to generate specified data transformations based on the column data grunt> foreach_data = FOREACH student_details GENERATE id,age,city; grunt> Dump foreach_data; Output It will produce the following output, displaying the contents of the relation foreach_data.
  • 13. Page 13Classification: Restricted Verification (1,21,Salt Lake City) (2,22, Salt Lake City) (3,22, Salt Lake City) (4,21, Salt Lake City) (5,23, Salt Lake City) (6,23, Salt Lake City) (7,24, Salt Lake City) (8,24, Salt Lake City) Assert operator is used for data validation. The script will fail if it doesn't meets the specified condition in assert paste the data on desktop as 12,23 23,34 -21,22
  • 14. Page 14Classification: Restricted Verification a = load '/home/mishra/Desktop/exp' USING PigStorage(',') AS (id:int,roll:int); now apply the assert operator grunt> assert a by id >0,'a cant be neg' dump a; an error is generated as one of the values in id is negative check the details at the generated log file of the pig..at the end of the file you will find Assertion violated: a cant be neg now assume another example witth same data grunt> b = load '/home/mishra/Desktop/exp' USING PigStorage(',') AS (id:int,roll:int); grunt> assert b by id > 13,'value is below 13'; Dump b; an error is generated as few values in id is less than 13 check the details at the generated log file of the pig..at the end of the file you will find Assertion violated: value is below 13
  • 15. Page 15Classification: Restricted Macros in Pig We can develop more reusable scripts in Pig Latin Using Macros also.Macro is a kind of function written in Pig Latin. DEFINE keyword is used to make macros You can define a function by writing a macro and then reuse that macro paste the undergiven data of student with fields as id,name,fees,rollno respectively::::::::::::: 10,Peter,10000,1 15,Aaron,20000,25 30,Danny,30000,1 40,Angela,40000,35 move the data to hdfs by::::: hadoop fs -put /home/ands/Desktop/xyz.txt /pigy make a file on Desktop with .pig extention(say macro.pig) and paste the lines below::::::::::::::: DEFINE myfilter(relvar,colvar) returns x{ $x = filter $relvar by $colvar==1; };
  • 16. Page 16Classification: Restricted Macros in Pig stu = load '/pigy'using PigStorage(',') as (id,name,fees,rollno); studrollno1 =myfilter( stu,rollno); dump studrollno1; Above macro takes two values as input,one is relation variable (relvar) and second is column variable (colvar) macro checks if colvar equals to 1 or not now change directory to desktop and run::::: pig -f macro.pig
  • 17. Page 17Classification: Restricted Topics to be covered in next session • Sqoop • Sqoop Installation • Exporting the data • Exporting from Hadoop to SQL