SlideShare a Scribd company logo
Introduction to Pig Xiafei.qiu@PCA
Nested Data Model Field, Tuple, Bag, Map
Normal Operators Arithmetic Operators X = FOREACH A GENERATE f1, f2, f1 % f2; Boolean Operators X = FILTER A BY (f1 == 8) OR (NOT (f2+f3 > f1)); Cast operators X = FOREACH B GENERATE group, (chararray) COUNT(A) AS total; Comparison Operators X = FILTER A BY (f1 matches '.*apache.*') OR (NOT (f2+f3 > f1)); Flatten Operator Tuple: remove a level of nesting Bag :remove a level of nesting, may cause cross product 
Normal Operators
Relational Operators LOADa bag of tuples A = LOAD 'data' [USING function] [AS schema];  STORE A = STORE alias INTO 'directory' [USING function]; FOREACHtuple in the bag, produce a new tuple A = FOREACH queries GENERATE uid, expandQuery(query); FILTERa bag to produce a subset of it A = FILTER queries BY uidneq ‘bot’ OR notBot(uid);
Relational Operators COGROUP/GROUPone or less than 127 relations alias = GROUP … by …, … by… {group: int, A: {name: chararray,age: int,gpa: float}} (18,{(John,18,4.0F),(Joe,18,3.8F)})
Relational Operators JOIN(inner/outer) Replicated Joins one or more relations are small enough to fit into main memory.  Skewed Joins computes a histogram of the key space and uses this data to allocate reducers for a given key.  Merge Joins Sorted, perform join on map phase
Relational Operators
Relational Operators ORDERalias by filed DESC/ASC Unstable SPLITalias INTO alias IF …, alias IF … CROSS cross product X = CROSS A, B; DISTINCT Removes duplicate tuples in a relation. X = DISTINCT A; LIMIT LIMITE A 3; SAMPLE SAMPLE alias size; IMPORT Import other .pig file DEFINE Define a Pig macro.
Built In Eval Function AVG/MAX/MIN/SUM  on a single column of a bag; group it first COUNT/ COUNT_STAR  number of elements in a bag; COUNT_STAR  counts null CONCAT DIFF IsEmpty SIZE TOKENIZE
Other Built In Function Load/Store Functions Math Functions String Functions
Map-Reduce Plan Compilation Compile each GROUP into distinct Map-Reduce job Push commands between LOAD and GROUP to the Map Side Commands between subsequent GROUP Gi and Gi+1 pushed into the Reduce  Side of Gi
Map-Reduce Plan Compilation ORDER is compiled into two map-reduce jobs. MR1: sample the key space MR2: sort
User Defined Function Simple Eval Function public class UPPER extends EvalFunc<String>{  public String exec(Tuple input) throws IOException {     // .......  }}
User Defined Function Aggregate Functions Algebraic Interface they can be computed incrementally in a distributed fashion. Accumulator Interface designed to decrease memory usage
Accumulator Interface public interface Accumulator <T> {    public void accumulate(Tuple b) throws IOException;      public T getValue();      public void cleanup();}
Aggregate Functions public interface Algebraic{        public String getInitial();        public String getIntermed();        public String getFinal();}

More Related Content

What's hot

STACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data StructureSTACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data Structure
Yaksh Jethva
 
Introduction To Stack
Introduction To StackIntroduction To Stack
Introduction To Stack
Education Front
 
Stack project
Stack projectStack project
Stack project
Amr Aboelgood
 
Stack of Data structure
Stack of Data structureStack of Data structure
Stack of Data structure
Sheikh Monirul Hasan
 
Stack
StackStack
Stack - Data Structure
Stack - Data StructureStack - Data Structure
Stack - Data Structure
Bhavesh Sanghvi
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementationSri Prasanna
 
Stack push pop
Stack push popStack push pop
Stack push pop
A. S. M. Shafi
 
Csa stack
Csa stackCsa stack
Csa stack
PCTE
 
Presentation topic is stick data structure
Presentation topic is stick data structurePresentation topic is stick data structure
Presentation topic is stick data structure
AizazAli21
 
stack presentation
stack presentationstack presentation
Simple Linear Regression with R
Simple Linear Regression with RSimple Linear Regression with R
Simple Linear Regression with R
Jerome Gomes
 
Stack and queue
Stack and queueStack and queue
Stack and queue
Shakila Mahjabin
 
Stacks Implementation and Examples
Stacks Implementation and ExamplesStacks Implementation and Examples
Stacks Implementation and Examplesgreatqadirgee4u
 
Stack organization
Stack organizationStack organization
Stack organization
chauhankapil
 
Stacks in c++
Stacks in c++Stacks in c++
Stacks in c++
Vineeta Garg
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: Introduction
Rsquared Academy
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple Graphs
Rsquared Academy
 

What's hot (20)

STACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data StructureSTACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data Structure
 
Introduction To Stack
Introduction To StackIntroduction To Stack
Introduction To Stack
 
Stack project
Stack projectStack project
Stack project
 
Algo>Stacks
Algo>StacksAlgo>Stacks
Algo>Stacks
 
Stack of Data structure
Stack of Data structureStack of Data structure
Stack of Data structure
 
Stack
StackStack
Stack
 
Stack - Data Structure
Stack - Data StructureStack - Data Structure
Stack - Data Structure
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
 
Stack push pop
Stack push popStack push pop
Stack push pop
 
Stack a Data Structure
Stack a Data StructureStack a Data Structure
Stack a Data Structure
 
Csa stack
Csa stackCsa stack
Csa stack
 
Presentation topic is stick data structure
Presentation topic is stick data structurePresentation topic is stick data structure
Presentation topic is stick data structure
 
stack presentation
stack presentationstack presentation
stack presentation
 
Simple Linear Regression with R
Simple Linear Regression with RSimple Linear Regression with R
Simple Linear Regression with R
 
Stack and queue
Stack and queueStack and queue
Stack and queue
 
Stacks Implementation and Examples
Stacks Implementation and ExamplesStacks Implementation and Examples
Stacks Implementation and Examples
 
Stack organization
Stack organizationStack organization
Stack organization
 
Stacks in c++
Stacks in c++Stacks in c++
Stacks in c++
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: Introduction
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple Graphs
 

Similar to Introduction to pig

Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
vishal choudhary
 
4.1-Pig.pptx
4.1-Pig.pptx4.1-Pig.pptx
4.1-Pig.pptx
akhileshyadav718837
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
Mathias Herberts
 
ADT STACK and Queues
ADT STACK and QueuesADT STACK and Queues
ADT STACK and Queues
BHARATH KUMAR
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
 
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)
Calvin Cheng
 
Lec2 Mapred
Lec2 MapredLec2 Mapred
Lec2 Mapred
mobius.cn
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementationtugrulh
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
pinakspatel
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
Mapfilterreducepresentation
ManjuKumara GH
 
Functional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative ProgrammersFunctional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative Programmers
Chris
 
Type class survival guide
Type class survival guideType class survival guide
Type class survival guide
Mark Canlas
 
Data structures stacks
Data structures   stacksData structures   stacks
Data structures stacks
maamir farooq
 
stacks and queues
stacks and queuesstacks and queues
stacks and queues
DurgaDeviCbit
 
Fp in scala part 1
Fp in scala part 1Fp in scala part 1
Fp in scala part 1
Hang Zhao
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursion
David Atchley
 
Pig
PigPig
Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)
osfameron
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
Zenith It Solutions
 
Pig workshop
Pig workshopPig workshop
Pig workshop
Sudar Muthu
 

Similar to Introduction to pig (20)

Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
 
4.1-Pig.pptx
4.1-Pig.pptx4.1-Pig.pptx
4.1-Pig.pptx
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
 
ADT STACK and Queues
ADT STACK and QueuesADT STACK and Queues
ADT STACK and Queues
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)
 
Lec2 Mapred
Lec2 MapredLec2 Mapred
Lec2 Mapred
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
Mapfilterreducepresentation
 
Functional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative ProgrammersFunctional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative Programmers
 
Type class survival guide
Type class survival guideType class survival guide
Type class survival guide
 
Data structures stacks
Data structures   stacksData structures   stacks
Data structures stacks
 
stacks and queues
stacks and queuesstacks and queues
stacks and queues
 
Fp in scala part 1
Fp in scala part 1Fp in scala part 1
Fp in scala part 1
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursion
 
Pig
PigPig
Pig
 
Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Introduction to pig