SlideShare a Scribd company logo
1 of 9
Download to read offline
Pig
Relational Operators - I:
Order, Distinct, Limit,
GroupBy
Pig Relational Operator: ORDER
 This helps to sort the data based on Ascending or
Descending manner.
1. Sorting for numerical fields are based on numerically.
2. Sorting for chararray fields are based on lexically.
3. Sorting for bytearray fields are based on lexically.
4. Nulls are considered to be smaller than other values. Therefore
it will always come first or last during ascending or descending
the results.
Let’s perform this with the help of an example;
grunt> dataTransaction = Load ‘/home/hduser/datasets/store.csv’
using PigStorage(‘,’) AS
(Product_Name:chararray,CustomerName:chararray,Transaction_I
D:byearray,TransAmt1:bytearray,TransAmt2:bytearray,
TransAmt3:bytearray, Place:chararray, Department:chararray);
Rupak Roy
Pig Relational Operator: ORDER
grunt> orderbyName = ORDER dataTransaction by
Name;
Example 2:
grunt> orderbyNameNsymbol= order datatransaction
by date, symbol;
Example 3:
grunt> desc= order datatransaction by close desc,
open;
Here close column will have descending order and
since we didn’t mentioned any order for open column
it will take ascending order by default.
Rupak Roy
Pig Relational Operator: LIMIT
 Limit simply limits the number of records to display.
 For example, if we have 1 million rows and columns
and if we dump the results for testing or any purpose it
will take a lot of time to finish displaying the results one
by one which is very time consuming process, so to
make sure our required script is working it is better to
view some results rather than displaying the whole
results.
 However Pig will still read all the records even we limit
the display of results by (assume) 20 records, but it will
also display 20 different records each time even we
use the same limit query. We can overcome this issue
by using ORDER operator immediately after the limit
operator and will guarantee the same 20 records
each time when we use the same limit query.
Rupak Roy
Pig Relational Operator: LIMIT
grunt> Trecords= LIMIT dataTransaction 20;
grunt> dump Trecords;
grunt> dump Trecords; #it will display 20 different records.
#to overcome the limit issue
grunt> order= order dataTransaction by $0;
grunt> Trecords= Limit order 20;
grunt> dump Trecords;
Again we will test for the same results;
grunt> Trecrods = Limit order 20;
Rupak Roy
Pig Relational Operator: DISTINCT
 This operator simply removes the duplicate data.
grunt> RD = DISTINCT dataTransaction;
Note: Distinct operator makes use of a combiner
or we can say semi-reducer between the map
phase and reducer phase to remove the
duplicates.
Rupak Roy
Pig Relational Operator: GROUP
 Group operator is one of the important functions for grouping
the data from a large pool of datasets.
grunt> grouping = GROUP dataTransaction by Place;
grunt> describe grouping;
grunt>cnt = foreach grouping GENERATE group, COUNT( $1);
grunt> dump cnt;
By applying this to our dataset store.csv we can find from how
many places each customer purchase similar products.
Note: to verify this result, load the dataset in excel (since this is a
subset of a dataset and will be suitable to view it in excel). Use the
filter function then select only Christy Britain, we will see she have
purchased the similar products from 6 different places.
Rupak Roy
 Group on multiple keys:
grunt> grouped = GROUP dataTransaction by
(Place, Department);
Pig Relational Operator: GROUP
Rupak Roy
Next
 More into advanced relational operators
like foreach, filter, join and more.
Rupak Roy

More Related Content

What's hot

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Similarity Measures (pptx)
Similarity Measures (pptx)Similarity Measures (pptx)
Similarity Measures (pptx)JackDi2
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learningdataalcott
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTRishabhTyagi48
 
Cross validation
Cross validationCross validation
Cross validationRidhaAfrawe
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Unit 2 data link control
Unit 2 data link controlUnit 2 data link control
Unit 2 data link controlVishal kakade
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsUmang MIshra
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and boundAbhishek Singh
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & DescriptorsPundrikPatel
 
Issues in knowledge representation
Issues in knowledge representationIssues in knowledge representation
Issues in knowledge representationSravanthi Emani
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 

What's hot (20)

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Similarity Measures (pptx)
Similarity Measures (pptx)Similarity Measures (pptx)
Similarity Measures (pptx)
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learning
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Cross validation
Cross validationCross validation
Cross validation
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Unit 2 data link control
Unit 2 data link controlUnit 2 data link control
Unit 2 data link control
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning Algorithms
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and bound
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Random forest
Random forestRandom forest
Random forest
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & Descriptors
 
Issues in knowledge representation
Issues in knowledge representationIssues in knowledge representation
Issues in knowledge representation
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 

Similar to Apache PIG Relational Operations

Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Ramda, a functional JavaScript library
Ramda, a functional JavaScript libraryRamda, a functional JavaScript library
Ramda, a functional JavaScript libraryDerek Willian Stavis
 
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...tdc-globalcode
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig ContinuedAnandMHadoop
 
Compliance as Code with terraform-compliance
Compliance as Code with terraform-complianceCompliance as Code with terraform-compliance
Compliance as Code with terraform-complianceEmre Erkunt
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slidesAnandMHadoop
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
Multiple regression with R
Multiple regression with RMultiple regression with R
Multiple regression with RJerome Gomes
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to PigChris Wilkes
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 

Similar to Apache PIG Relational Operations (20)

Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Pig latin
Pig latinPig latin
Pig latin
 
Ramda, a functional JavaScript library
Ramda, a functional JavaScript libraryRamda, a functional JavaScript library
Ramda, a functional JavaScript library
 
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
TDC2016POA | Trilha Programacao Funcional - Ramda JS como alternativa a under...
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
 
Compliance as Code with terraform-compliance
Compliance as Code with terraform-complianceCompliance as Code with terraform-compliance
Compliance as Code with terraform-compliance
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slides
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
pig intro.pdf
pig intro.pdfpig intro.pdf
pig intro.pdf
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Multiple regression with R
Multiple regression with RMultiple regression with R
Multiple regression with R
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, referenceRupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsRupak Roy
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 

Recently uploaded

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Recently uploaded (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Apache PIG Relational Operations

  • 1. Pig Relational Operators - I: Order, Distinct, Limit, GroupBy
  • 2. Pig Relational Operator: ORDER  This helps to sort the data based on Ascending or Descending manner. 1. Sorting for numerical fields are based on numerically. 2. Sorting for chararray fields are based on lexically. 3. Sorting for bytearray fields are based on lexically. 4. Nulls are considered to be smaller than other values. Therefore it will always come first or last during ascending or descending the results. Let’s perform this with the help of an example; grunt> dataTransaction = Load ‘/home/hduser/datasets/store.csv’ using PigStorage(‘,’) AS (Product_Name:chararray,CustomerName:chararray,Transaction_I D:byearray,TransAmt1:bytearray,TransAmt2:bytearray, TransAmt3:bytearray, Place:chararray, Department:chararray); Rupak Roy
  • 3. Pig Relational Operator: ORDER grunt> orderbyName = ORDER dataTransaction by Name; Example 2: grunt> orderbyNameNsymbol= order datatransaction by date, symbol; Example 3: grunt> desc= order datatransaction by close desc, open; Here close column will have descending order and since we didn’t mentioned any order for open column it will take ascending order by default. Rupak Roy
  • 4. Pig Relational Operator: LIMIT  Limit simply limits the number of records to display.  For example, if we have 1 million rows and columns and if we dump the results for testing or any purpose it will take a lot of time to finish displaying the results one by one which is very time consuming process, so to make sure our required script is working it is better to view some results rather than displaying the whole results.  However Pig will still read all the records even we limit the display of results by (assume) 20 records, but it will also display 20 different records each time even we use the same limit query. We can overcome this issue by using ORDER operator immediately after the limit operator and will guarantee the same 20 records each time when we use the same limit query. Rupak Roy
  • 5. Pig Relational Operator: LIMIT grunt> Trecords= LIMIT dataTransaction 20; grunt> dump Trecords; grunt> dump Trecords; #it will display 20 different records. #to overcome the limit issue grunt> order= order dataTransaction by $0; grunt> Trecords= Limit order 20; grunt> dump Trecords; Again we will test for the same results; grunt> Trecrods = Limit order 20; Rupak Roy
  • 6. Pig Relational Operator: DISTINCT  This operator simply removes the duplicate data. grunt> RD = DISTINCT dataTransaction; Note: Distinct operator makes use of a combiner or we can say semi-reducer between the map phase and reducer phase to remove the duplicates. Rupak Roy
  • 7. Pig Relational Operator: GROUP  Group operator is one of the important functions for grouping the data from a large pool of datasets. grunt> grouping = GROUP dataTransaction by Place; grunt> describe grouping; grunt>cnt = foreach grouping GENERATE group, COUNT( $1); grunt> dump cnt; By applying this to our dataset store.csv we can find from how many places each customer purchase similar products. Note: to verify this result, load the dataset in excel (since this is a subset of a dataset and will be suitable to view it in excel). Use the filter function then select only Christy Britain, we will see she have purchased the similar products from 6 different places. Rupak Roy
  • 8.  Group on multiple keys: grunt> grouped = GROUP dataTransaction by (Place, Department); Pig Relational Operator: GROUP Rupak Roy
  • 9. Next  More into advanced relational operators like foreach, filter, join and more. Rupak Roy