SlideShare a Scribd company logo
1 of 18
Download to read offline
Apache Pig
Extract Transform Load
Introduction
Apache Pig:
● is an abstraction over MapReduce
● has structure which is amenable for substantial parallelization
● is generally used with Hadoop
● does lazy evaluation
● uses Pig Latin, which provides
○ Ease of Programming
○ Optimization opportunities
■ Focus on semantics rather than efficiency
○ Extensibility
■ Can create UDFs
Architecture
Pig Latin Data Model
● Fully nested & complex data model
● Atom
● Tuple
● Bag
● Map
● Relation
Pig Environment
Establishing Pig environment contains following three major steps:
● Installation
○ https://pig.apache.org/docs/r0.7.0/setup.html
● Grunt shell
○ Pig interactive shell which accepts Pig-latin based statements.
● Execution
○ Two modes of execution, Mapreduce mode and Local mode, we’ll see both ahead.
Pig Latin
statements
relations
expressions schemas
Data Types
Basic Complex
int, long, biginteger Tuple : (raju, 5)
float, double, bigdecimal Bag: {(raju,5),(ranu,6)}
chararray Map: [‘name’#’raju’,’age’:40]
bytearray
boolean
datetime
Operators and Constructs
● All standard Arithmetic and Relational operator
● Constructs
○ Tuple: ()
○ Bag: {}
○ Map: []
● Relational Operators
○ Load & Storing: LOAD & STORE
○ Filtering: FILTER, DISTINCT, FOREACH, GENERATE, STREAM
○ Grouping & Joining: JOIN, COGROUP, CROSS, GROUP
○ Sorting: ORDER, LIMIT
○ Combining & Splitting: UNION, SPLIT
○ Diagnostic & Debugging: DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE
Sample Queries
student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt'
USING PigStorage(',')
as ( id:int, firstname:chararray, lastname:chararray, phone:chararray,
city:chararray );
STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');
Dump student
describe student;
explain Relation_name;
illustrate student
group_data = GROUP student_details by age
cogroup_data = COGROUP student_details by age, employee_details by age;
result = JOIN relation1 BY columnname, relation2 BY columnname;
cross_data = CROSS customers, orders;
student = UNION student1, student2;
SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25);
filter_data = FILTER student_details BY city == 'Chennai';
distinct_data = DISTINCT student_details;
foreach_data = FOREACH student_details GENERATE id,age,city;
order_by_data = ORDER student_details BY age DESC;
limit_data = LIMIT student_details 4;
Sample Queries (contd.)
Eval & Load/Store Functions
Eval:
AVG() BagToString() CONCAT() COUNT() COUNT_STAR()
DIFF() IsEmpty() MAX() MIN() SIZE() SUM()
Load/Store:
PigStorage()
TextLoader()
BinStorage()
UDFs(User Defined Functions)
● UDF ?
● Piggybanks?
● Types of UDFs:
○ Filter Functions
○ Eval Functions
○ Algebric Functions
● Writing UDFs
● Registering UDFs
● Defining Alias
● Using UDFs
documentation
UDF’s (contd.)
What is UDF?
In addition to the built-in functions, Pig provides excellent support for User Defined
Functions(UDF’s), where we can define our own functions and use them along with
different Pig Latin built-in functions. Complete support to write UDF’s is provided in
Java and partially in other languages, viz. Jython, Python, JS, Ruby and Groovy.
Piggybanks?
Apache Pig provides a central repository for Java based UDF’s using which you can
use the UDF’s written by other people and you can also contribute there.
UDF Example
● This example is written for converting given string to Uppercase,
● You can download the java code from link.
● Extract it and create the jar file from the source code or download from here.
● Now assume we have the jar file at /home/hduser/hadoop/exampleJARs/udfEx1.jar and some sample data at
/hduser/sampleData/sampledata.txt
● Now on grunt shell
REGISTER /home/hduser/hadoop/exampleJARs/udfEx1.jar;
DEFINE udf_eval SampleEval();
data = load '/hduser/sampleData/sampledata.txt' using PigStorage(',') as (id:int, name:chararray, age:int, city:chararray);
upper_case_data = foreach data generate udf_eval(name);
store upper_case_data into 'hdfs://localhost:9000/hduser/outputudf15';
Pig Scripts
● Modes of running Pig Scripts
○ Mapreduce mode
○ Local mode (-x parameter)
pig -x <mode> <pig_script_path>
Pig → MapReduce
Author:
Pathan Mudassirkhan
Twitter Linkedin

More Related Content

Viewers also liked

Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Kevin Weil
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Cloudera, Inc.
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystèmeKhanh Maudoux
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoophajlaoui jaleleddine
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big DataShawn Hermans
 
Apache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésApache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésRomain Hardouin
 

Viewers also liked (14)

Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoop
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big Data
 
Un introduction à Pig
Un introduction à PigUn introduction à Pig
Un introduction à Pig
 
Apache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésApache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalités
 

Similar to Apache pig

Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Practical pig
Practical pigPractical pig
Practical pigtrihug
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in ProductionRobert Sanders
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScriptJorg Janke
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMHolden Karau
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsHolden Karau
 
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...Ron Reiter
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 

Similar to Apache pig (20)

Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
Practical pig
Practical pigPractical pig
Practical pig
 
Pig latin
Pig latinPig latin
Pig latin
 
Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
 
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
 
Node.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniquesNode.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniques
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 

Recently uploaded

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Recently uploaded (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Apache pig

  • 2. Introduction Apache Pig: ● is an abstraction over MapReduce ● has structure which is amenable for substantial parallelization ● is generally used with Hadoop ● does lazy evaluation ● uses Pig Latin, which provides ○ Ease of Programming ○ Optimization opportunities ■ Focus on semantics rather than efficiency ○ Extensibility ■ Can create UDFs
  • 4. Pig Latin Data Model ● Fully nested & complex data model ● Atom ● Tuple ● Bag ● Map ● Relation
  • 5. Pig Environment Establishing Pig environment contains following three major steps: ● Installation ○ https://pig.apache.org/docs/r0.7.0/setup.html ● Grunt shell ○ Pig interactive shell which accepts Pig-latin based statements. ● Execution ○ Two modes of execution, Mapreduce mode and Local mode, we’ll see both ahead.
  • 7. Data Types Basic Complex int, long, biginteger Tuple : (raju, 5) float, double, bigdecimal Bag: {(raju,5),(ranu,6)} chararray Map: [‘name’#’raju’,’age’:40] bytearray boolean datetime
  • 8. Operators and Constructs ● All standard Arithmetic and Relational operator ● Constructs ○ Tuple: () ○ Bag: {} ○ Map: [] ● Relational Operators ○ Load & Storing: LOAD & STORE ○ Filtering: FILTER, DISTINCT, FOREACH, GENERATE, STREAM ○ Grouping & Joining: JOIN, COGROUP, CROSS, GROUP ○ Sorting: ORDER, LIMIT ○ Combining & Splitting: UNION, SPLIT ○ Diagnostic & Debugging: DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE
  • 9. Sample Queries student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray ); STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (','); Dump student describe student; explain Relation_name; illustrate student group_data = GROUP student_details by age cogroup_data = COGROUP student_details by age, employee_details by age; result = JOIN relation1 BY columnname, relation2 BY columnname;
  • 10. cross_data = CROSS customers, orders; student = UNION student1, student2; SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25); filter_data = FILTER student_details BY city == 'Chennai'; distinct_data = DISTINCT student_details; foreach_data = FOREACH student_details GENERATE id,age,city; order_by_data = ORDER student_details BY age DESC; limit_data = LIMIT student_details 4; Sample Queries (contd.)
  • 11. Eval & Load/Store Functions Eval: AVG() BagToString() CONCAT() COUNT() COUNT_STAR() DIFF() IsEmpty() MAX() MIN() SIZE() SUM() Load/Store: PigStorage() TextLoader() BinStorage()
  • 12. UDFs(User Defined Functions) ● UDF ? ● Piggybanks? ● Types of UDFs: ○ Filter Functions ○ Eval Functions ○ Algebric Functions ● Writing UDFs ● Registering UDFs ● Defining Alias ● Using UDFs documentation
  • 13. UDF’s (contd.) What is UDF? In addition to the built-in functions, Pig provides excellent support for User Defined Functions(UDF’s), where we can define our own functions and use them along with different Pig Latin built-in functions. Complete support to write UDF’s is provided in Java and partially in other languages, viz. Jython, Python, JS, Ruby and Groovy. Piggybanks? Apache Pig provides a central repository for Java based UDF’s using which you can use the UDF’s written by other people and you can also contribute there.
  • 14. UDF Example ● This example is written for converting given string to Uppercase, ● You can download the java code from link. ● Extract it and create the jar file from the source code or download from here. ● Now assume we have the jar file at /home/hduser/hadoop/exampleJARs/udfEx1.jar and some sample data at /hduser/sampleData/sampledata.txt ● Now on grunt shell REGISTER /home/hduser/hadoop/exampleJARs/udfEx1.jar; DEFINE udf_eval SampleEval(); data = load '/hduser/sampleData/sampledata.txt' using PigStorage(',') as (id:int, name:chararray, age:int, city:chararray); upper_case_data = foreach data generate udf_eval(name); store upper_case_data into 'hdfs://localhost:9000/hduser/outputudf15';
  • 15. Pig Scripts ● Modes of running Pig Scripts ○ Mapreduce mode ○ Local mode (-x parameter) pig -x <mode> <pig_script_path>
  • 17.