SlideShare a Scribd company logo
1 of 9
Download to read offline
Pig
Casting, Reference
Casting
Casting enables us to cast or convert data from one type to
another, as long as conversion is supported. For example,
suppose if we have an integer field (int) which you want to
convert to a string. We can cast this field from int to chararray
using chararray
For example:
grunt> select = foreach data generate $0, (chararray)$4,
(chararray)$5;
Grunt> dump select;
(ryan,67,57)
(Bob,77,75)
(Alica,68,)
(Bryan,81,79)
(Kate,66,69)
Rupak Roy
Reference field by position
 We can refer the data fields by name as well as
with there positions( $0,$1,,,,,).
$0 $1 $2 $3 $4 $5
Name Age School Location
Test
Score
1
Test
Score
2
Ryan 22 St.JohnsSchool NewAvenue 67 57
Bob 23 St.EdumndSchool Downtown 77 75
Alica Na Don Bosco ParkAvenue 65 79
Bryan 24 St.JhonsSchool NewAvenue 81 79
Kate 22 Don Bosco ParkAvenue 66 69
Rupak Roy
#filter the data by age >= 22
grunt> age = FILTER data by $1 >= 22;
grunt> dump age;
Here, we are referencing the age column by position $1. However
we can reference them directly by name itself such as
grunt > age = FILTER data by age >=22;
But sometimes it becomes tedious to reference the column by its
name when we will be dealing large datasets with complex
column names.
#filter the data by test score1 <= 66
grunt> testscore = FILTER data $4<= 66;
grunt> dump testscore;
Rupak Roy
grunt> dump testscore;
We will notice that the output will show only
one record that is (kate,22, Don bosco,
ParkAvenue,66,69) but in our original dataset
we have an another record of testscore1<= 66
i.e. Alica’s.
This is because when we defined while loading
the data the column values are separated by
comma (, ) and in Alica row 2nd column have
no values so it automatically took the next
value after comma Don Bosco as the 2nd
column($3) value input for column($1) ‘age’.
Rupak Roy
Filter data based on position of the column
grunt> select = foreach data generate $0,$4,$5;
grunt> dump select;
(ryan,67,57)
(Bob,77,75)
(Alica,68,)
(Bryan,81,79)
(Kate,66,69)
Rupak Roy
Select columns using reference
grunt> select_all= foreach data generate *;
grunt> dump select_all;
Grunt> select_range= foreach data generate $0..$3;
grunt> dump select_range;
(Name,age)
(Ryan,22)
(bob,23)
(Alica,Don Bosco)
(Bryan,24)
(kate,22)
Showing Don Bosco instead of age
because the 2nd value for Alica’s
age is missing, therefore it will
consider the next value as the 2nd
column ‘age’ value. It is advisable
to mark the missing value as NA/NIL
so that it will not get misplaced
with the other column values.
Rupak Roy
Reference range of columns/fields
grunt> leftsidedata = foreach data generate ..$1;
grunt> middle = foreach data generate $0 .. $2;
grunt> from_last= foreach data generate $2.. ;
grunt> random= foreach data generate $0, $4 ..$6;
If schema is not defined while loading the dataset, we can even define
the schema by using a query. For example:
grunt> random = foreach data generate (chararray)$0, (chararray)$3;
Alternatively, we can also assign Alias name to the field like
grunt> random = foreach data generate (chararray)$0 as FC,
chararray)$3 as LC ;
grunt> describe random;
grunt> alias = FILTER alias by fc ==‘Kate’
Rupak Roy
Next
 We will learn PIG relational operators and
how to perform them.
Rupak Roy

More Related Content

More from Rupak Roy

Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsRupak Roy
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components Rupak Roy
 
Map Reduce Execution Architecture
Map Reduce Execution Architecture Map Reduce Execution Architecture
Map Reduce Execution Architecture Rupak Roy
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)Rupak Roy
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Geo Spatial Plot using R
Geo Spatial Plot using R Geo Spatial Plot using R
Geo Spatial Plot using R Rupak Roy
 
Data visualization using case study
Data visualization using case studyData visualization using case study
Data visualization using case studyRupak Roy
 

More from Rupak Roy (20)

Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
Map Reduce Execution Architecture
Map Reduce Execution Architecture Map Reduce Execution Architecture
Map Reduce Execution Architecture
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Geo Spatial Plot using R
Geo Spatial Plot using R Geo Spatial Plot using R
Geo Spatial Plot using R
 
Data visualization using case study
Data visualization using case studyData visualization using case study
Data visualization using case study
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Apache PIG casting, reference

  • 2. Casting Casting enables us to cast or convert data from one type to another, as long as conversion is supported. For example, suppose if we have an integer field (int) which you want to convert to a string. We can cast this field from int to chararray using chararray For example: grunt> select = foreach data generate $0, (chararray)$4, (chararray)$5; Grunt> dump select; (ryan,67,57) (Bob,77,75) (Alica,68,) (Bryan,81,79) (Kate,66,69) Rupak Roy
  • 3. Reference field by position  We can refer the data fields by name as well as with there positions( $0,$1,,,,,). $0 $1 $2 $3 $4 $5 Name Age School Location Test Score 1 Test Score 2 Ryan 22 St.JohnsSchool NewAvenue 67 57 Bob 23 St.EdumndSchool Downtown 77 75 Alica Na Don Bosco ParkAvenue 65 79 Bryan 24 St.JhonsSchool NewAvenue 81 79 Kate 22 Don Bosco ParkAvenue 66 69 Rupak Roy
  • 4. #filter the data by age >= 22 grunt> age = FILTER data by $1 >= 22; grunt> dump age; Here, we are referencing the age column by position $1. However we can reference them directly by name itself such as grunt > age = FILTER data by age >=22; But sometimes it becomes tedious to reference the column by its name when we will be dealing large datasets with complex column names. #filter the data by test score1 <= 66 grunt> testscore = FILTER data $4<= 66; grunt> dump testscore; Rupak Roy
  • 5. grunt> dump testscore; We will notice that the output will show only one record that is (kate,22, Don bosco, ParkAvenue,66,69) but in our original dataset we have an another record of testscore1<= 66 i.e. Alica’s. This is because when we defined while loading the data the column values are separated by comma (, ) and in Alica row 2nd column have no values so it automatically took the next value after comma Don Bosco as the 2nd column($3) value input for column($1) ‘age’. Rupak Roy
  • 6. Filter data based on position of the column grunt> select = foreach data generate $0,$4,$5; grunt> dump select; (ryan,67,57) (Bob,77,75) (Alica,68,) (Bryan,81,79) (Kate,66,69) Rupak Roy
  • 7. Select columns using reference grunt> select_all= foreach data generate *; grunt> dump select_all; Grunt> select_range= foreach data generate $0..$3; grunt> dump select_range; (Name,age) (Ryan,22) (bob,23) (Alica,Don Bosco) (Bryan,24) (kate,22) Showing Don Bosco instead of age because the 2nd value for Alica’s age is missing, therefore it will consider the next value as the 2nd column ‘age’ value. It is advisable to mark the missing value as NA/NIL so that it will not get misplaced with the other column values. Rupak Roy
  • 8. Reference range of columns/fields grunt> leftsidedata = foreach data generate ..$1; grunt> middle = foreach data generate $0 .. $2; grunt> from_last= foreach data generate $2.. ; grunt> random= foreach data generate $0, $4 ..$6; If schema is not defined while loading the dataset, we can even define the schema by using a query. For example: grunt> random = foreach data generate (chararray)$0, (chararray)$3; Alternatively, we can also assign Alias name to the field like grunt> random = foreach data generate (chararray)$0 as FC, chararray)$3 as LC ; grunt> describe random; grunt> alias = FILTER alias by fc ==‘Kate’ Rupak Roy
  • 9. Next  We will learn PIG relational operators and how to perform them. Rupak Roy