SlideShare a Scribd company logo
PIG: A Big Data Processor
Tushar B. Kute,
http://tusharkute.com
What is Pig?
• Apache Pig is an abstraction over MapReduce. It is a
tool/platform which is used to analyze larger sets of
data representing them as data flows.
• Pig is generally used with Hadoop; we can perform all
the data manipulation operations in Hadoop using
Apache Pig.
• To write data analysis programs, Pig provides a high-
level language known as Pig Latin.
• This language provides various operators using which
programmers can develop their own functions for
reading, writing, and processing data.
Apache Pig
• To analyze data using Apache Pig, programmers
need to write scripts using Pig Latin language.
• All these scripts are internally converted to Map
and Reduce tasks.
• Apache Pig has a component known as Pig
Engine that accepts the Pig Latin scripts as
input and converts those scripts into
MapReduce jobs.
Why do we need Apache Pig?
• Using Pig Latin, programmers can perform MapReduce tasks
easily without having to type complex codes in Java.
• Apache Pig uses multi-query approach, thereby reducing the
length of codes. For example, an operation that would require you
to type 200 lines of code (LoC) in Java can be easily done by typing
as less as just 10 LoC in Apache Pig. Ultimately, Apache Pig
reduces the development time by almost 16 times.
• Pig Latin is SQL-like language and it is easy to learn Apache Pig
when you are familiar with SQL.
• Apache Pig provides many built-in operators to support data
operations like joins, filters, ordering, etc. In addition, it also
provides nested data types like tuples, bags, and maps that are
missing from MapReduce.
Features of Pig
• Rich set of operators: It provides many operators to perform
operations like join, sort, filer, etc.
• Ease of programming: Pig Latin is similar to SQL and it is easy to write
a Pig script if you are good at SQL.
• Optimization opportunities: The tasks in Apache Pig optimize their
execution automatically, so the programmers need to focus only on
semantics of the language.
• Extensibility: Using the existing operators, users can develop their
own functions to read, process, and write data.
• UDF’s: Pig provides the facility to create User-defined Functions in
other programming languages such as Java and invoke or embed them
in Pig Scripts.
• Handles all kinds of data: Apache Pig analyzes all kinds of data, both
structured as well as unstructured. It stores the results in HDFS.
Pig vs. MapReduce
Pig vs. SQL
Pig vs. Hive
Applications of Apache Pig
• To process huge data sources such as web logs.
• To perform data processing for search
platforms.
• To process time sensitive data loads.
Apache Pig – History
• In 2006, Apache Pig was developed as a
research project at Yahoo, especially to create
and execute MapReduce jobs on every dataset.
• In 2007, Apache Pig was open sourced via
Apache incubator.
• In 2008, the first release of Apache Pig came
out. In 2010, Apache Pig graduated as an
Apache top-level project.
Pig Architecture
Apache Pig – Components
• Parser: Initially the Pig Scripts are handled by the Parser. It
checks the syntax of the script, does type checking, and other
miscellaneous checks. The output of the parser will be a DAG
(directed acyclic graph), which represents the Pig Latin
statements and logical operators.
• Optimizer: The logical plan (DAG) is passed to the logical
optimizer, which carries out the logical optimizations such as
projection and pushdown.
• Compiler: The compiler compiles the optimized logical plan
into a series of MapReduce jobs.
• Execution engine: Finally the MapReduce jobs are submitted
to Hadoop in a sorted order. Finally, these MapReduce jobs are
executed on Hadoop producing the desired results.
Apache Pig – Data Model
Apache Pig – Elements
• Atom
– Any single value in Pig Latin, irrespective of their
data, type is known as an Atom.
– It is stored as string and can be used as string
and number. int, long, float, double, chararray,
and bytearray are the atomic values of Pig.
– A piece of data or a simple atomic value is known
as a field.
– Example: ‘raja’ or ‘30’
Apache Pig – Elements
• Tuple
– A record that is formed by an ordered set of
fields is known as a tuple, the fields can be of any
type. A tuple is similar to a row in a table of
RDBMS.
– Example: (Raja, 30)
Apache Pig – Elements
• Bag
– A bag is an unordered set of tuples. In other words, a
collection of tuples (non-unique) is known as a bag. Each
tuple can have any number of fields (flexible schema). A
bag is represented by ‘{}’. It is similar to a table in RDBMS,
but unlike a table in RDBMS, it is not necessary that every
tuple contain the same number of fields or that the fields
in th same position (column) have the same type.
– Example: {(Raja, 30), (Mohammad, 45)}
– A bag can be a field in a relation; in that context, it is
known as inner bag.
– Example: {Raja, 30, {9848022338, raja@gmail.com,}}
Apache Pig – Elements
• Relation
– A relation is a bag of tuples. The relations in Pig
Latin are unordered (there is no guarantee that
tuples are processed in any particular order).
• Map
– A map (or data map) is a set of key-value pairs.
The key needs to be of type chararray and should
be unique. The value might be of any type. It is
represented by ‘[]’
– Example: [name#Raja, age#30]
Installation of PIG
Download
• Download the tar.gz file of Apache Pig from
here:
http://mirror.fibergrid.in/apache/pig/pig-0.15.0/
pig-0.15.0.tar.gz
Extract and copy
• Extract this file using right-click -> 'Extract here'
option or by tar -xzvf command.
• Rename the created folder 'pig-0.15.0' to 'pig'
• Now, move this folder to /usr/lib using following
command:
$ sudo mv pig/ /usr/lib
Edit the bashrc file
• Open the bashrc file:
sudo gedit ~/.bashrc
• Go to end of the file and add following lines.
export PIG_HOME=/usr/lib/pig
export PATH=$PATH:$PIG_HOME/bin
• Type following command to make it in effect:
source ~/.bashrc
Start the Pig
• Start the pig in local mode:
pig -x local
• Start the pig in mapreduce mode (needs hadoop
datanode started):
pig -x mapreduce
Grunt shell
Data Processing with PIG
Example: movies_data.csv
1,Dhadakebaz,1986,3.2,7560
2,Dhumdhadaka,1985,3.8,6300
3,Ashi hi banva banvi,1988,4.1,7802
4,Zapatlela,1993,3.7,6022
5,Ayatya Gharat Gharoba,1991,3.4,5420
6,Navra Maza Navsacha,2004,3.9,4904
7,De danadan,1987,3.4,5623
8,Gammat Jammat,1987,3.4,7563
9,Eka peksha ek,1990,3.2,6244
10,Pachhadlela,2004,3.1,6956
Load data
• $ pig -x local
• grunt> movies = LOAD
'movies_data.csv' USING
PigStorage(',') as
(id,name,year,rating,duration)
• grunt> dump movies;
it displays the contents
Filter data
• grunt> movies_greater_than_35 =
FILTER movies BY (float)rating > 3.5;
• grunt> dump movies_greater_than_35;
Store the results data
• grunt> store movies_greater_than_35
into 'my_movies';
• It stores the result in local file system directory
named 'my_movies'.
Display the result
• Now display the result from local file system.
cat my_movies/part-m-00000
Load command
• The load command specified only the column
names. We can modify the statement as follows
to include the data type of the columns:
• grunt> movies = LOAD 
'movies_data.csv' USING 
PigStorage(',') as (id:int, 
name:chararray, year:int, 
rating:double, duration:int);
Check the filters
• List the movies that were released between 1950 and
1960
grunt> movies_between_90_95 = FILTER 
movies by year > 1990 and year < 1995;
• List the movies that start with the Alpahbet D
grunt> movies_starting_with_D = FILTER 
movies by name matches 'D.*';
• List the movies that have duration greater that 2 hours
grunt> movies_duration_2_hrs = FILTER 
movies by duration > 7200; 
Output
Movies between
1990 to 1995
Movies starts
W
ith 'D'
Movies greater
Than 2 hours
Describe
• DESCRIBE The schema of a relation/alias can be
viewed using the DESCRIBE command:
grunt> DESCRIBE movies;
movies: {id: int, name: chararray, 
year: int, rating: double, duration: 
int} 
Foreach
• FOREACH gives a simple way to apply
transformations based on columns. Let’s understand
this with an example.
• List the movie names its duration in minutes
grunt> movie_duration = FOREACH movies 
GENERATE name, (double)(duration/60);
• The above statement generates a new alias that has
the list of movies and it duration in minutes.
• You can check the results using the DUMP command.
Output
Group
• The GROUP keyword is used to group fields in a
relation.
• List the years and the number of movies released
each year.
grunt> grouped_by_year = group movies 
by year;
grunt> count_by_year = FOREACH 
grouped_by_year GENERATE group, 
COUNT(movies);
Output
Order by
• Let us question the data to illustrate the ORDER BY
operation.
• List all the movies in the ascending order of year.
grunt> desc_movies_by_year = ORDER 
movies BY year ASC;
grunt> DUMP desc_movies_by_year;
• List all the movies in the descending order of year.
grunt> asc_movies_by_year = ORDER movies 
by year DESC;
grunt> DUMP asc_movies_by_year;
Output- Ascending by year
From
1985
To
2004
Limit
• Use the LIMIT keyword to get only a limited number
for results from relation.
grunt> top_5_movies = LIMIT movies 5;
grunt> DUMP top_10_movies;
Pig: Modes of Execution
• Pig programs can be run in three methods which
work in both local and MapReduce mode. They are
– Script Mode
– Grunt Mode
– Embedded Mode
Script mode
• Script Mode or Batch Mode: In script mode, pig runs
the commands specified in a script file. The following
example shows how to run a pig programs from a
script file:
$ vim scriptfile.pig
   A = LOAD 'script_file';
   DUMP A;
$ pig ­x local scriptfile.pig
Grunt mode
• Grunt Mode or Interactive Mode: The grunt mode can also
be called as interactive mode. Grunt is pig's interactive shell.
It is started when no file is specified for pig to run.
$ pig ­x local
grunt> A = LOAD 'grunt_file';
grunt> DUMP A;
• You can also run pig scripts from grunt using run and exec
commands.
grunt> run scriptfile.pig
grunt> exec scriptfile.pig
Embedded mode
• You can embed pig programs in Java, python and
ruby and can run from the same.
Example: Wordcount program
• Q) How to find the number of occurrences of the
words in a file using the pig script?
• You can find the famous word count example written
in map reduce programs in apache website. Here we
will write a simple pig script for the word count
problem.
• The pig script given in next slide finds the number of
times a word repeated in a file:
Example: text file- shivneri.txt
Example: Wordcount program
lines = LOAD 'shivneri.txt' AS 
(line:chararray);
words = FOREACH lines GENERATE 
FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
w_count = FOREACH grouped GENERATE group, 
COUNT(words);
DUMP w_count;
forts.pig
Output snapshot
$ pig -x local forts.pig
References
• “Programming Pig” by Alan Gates, O'Reilly
Publishers.
• “Pig Design Patterns” by Pradeep Pasupuleti,
PACKT Publishing
• Tutorials Point
• http://github.com/rohitdens
• http://pig.apache.org
tushar@tusharkute.com
Thank you
This presentation is created using LibreOffice Impress 4.2.8.2, can be used freely as per GNU General Public License
Blogs
http://digitallocha.blogspot.in
http://kyamputar.blogspot.in
Web Resources
http://tusharkute.com

More Related Content

What's hot

13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Distributed Operating System_4
Distributed Operating System_4Distributed Operating System_4
Distributed Operating System_4
Dr Sandeep Kumar Poonia
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
HDFS Federation
HDFS FederationHDFS Federation
HDFS Federation
Hortonworks
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
Dr. C.V. Suresh Babu
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
File organization 1
File organization 1File organization 1
File organization 1
Rupali Rana
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Filepermissions in linux
Filepermissions in linuxFilepermissions in linux
Filepermissions in linux
Subashini Pandiarajan
 
User and groups administrator
User  and  groups administratorUser  and  groups administrator
User and groups administrator
Aisha Talat
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
RAID
RAIDRAID
Apache spark
Apache sparkApache spark
Apache spark
shima jafari
 

What's hot (20)

13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Apache ppt
Apache pptApache ppt
Apache ppt
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Distributed Operating System_4
Distributed Operating System_4Distributed Operating System_4
Distributed Operating System_4
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
HDFS Federation
HDFS FederationHDFS Federation
HDFS Federation
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
File organization 1
File organization 1File organization 1
File organization 1
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Filepermissions in linux
Filepermissions in linuxFilepermissions in linux
Filepermissions in linux
 
User and groups administrator
User  and  groups administratorUser  and  groups administrator
User and groups administrator
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
RAID
RAIDRAID
RAID
 
Apache spark
Apache sparkApache spark
Apache spark
 

Viewers also liked

Signal Handling in Linux
Signal Handling in LinuxSignal Handling in Linux
Signal Handling in Linux
Tushar B Kute
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
Jason Shao
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Edureka!
 
Unit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B Kute
Unit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B KuteUnit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B Kute
Unit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B Kute
Tushar B Kute
 
Apache Pig for Data Scientists
Apache Pig for Data ScientistsApache Pig for Data Scientists
Apache Pig for Data ScientistsDataWorks Summit
 
Part 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingPart 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module Programming
Tushar B Kute
 
MIS 02 foundations of information systems
MIS 02  foundations of information systemsMIS 02  foundations of information systems
MIS 02 foundations of information systems
Tushar B Kute
 
Mis 03 management information systems
Mis 03  management information systemsMis 03  management information systems
Mis 03 management information systems
Tushar B Kute
 
MIS 04 Management of Business
MIS 04  Management of BusinessMIS 04  Management of Business
MIS 04 Management of Business
Tushar B Kute
 
Introduction to linux ppt
Introduction to linux pptIntroduction to linux ppt
Introduction to linux ppt
Omi Vichare
 
Linux.ppt
Linux.ppt Linux.ppt
Linux.ppt
onu9
 
Study techniques of programming in c at kkwpss
Study techniques of programming in c at kkwpssStudy techniques of programming in c at kkwpss
Study techniques of programming in c at kkwpss
Tushar B Kute
 
MIS 05 Decision Support Systems
MIS 05  Decision Support SystemsMIS 05  Decision Support Systems
MIS 05 Decision Support Systems
Tushar B Kute
 
Apache Pig - JavaZone 2013
Apache Pig - JavaZone 2013Apache Pig - JavaZone 2013
Apache Pig - JavaZone 2013
janerikcarlsen
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
Anshul Bhatnagar
 
Open source applications softwares
Open source applications softwaresOpen source applications softwares
Open source applications softwares
Tushar B Kute
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
Facundo Farias
 
Module 1 introduction to Linux
Module 1 introduction to LinuxModule 1 introduction to Linux
Module 1 introduction to Linux
Tushar B Kute
 
Module 02 Using Linux Command Shell
Module 02 Using Linux Command ShellModule 02 Using Linux Command Shell
Module 02 Using Linux Command Shell
Tushar B Kute
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
Victor Sanchez Anguix
 

Viewers also liked (20)

Signal Handling in Linux
Signal Handling in LinuxSignal Handling in Linux
Signal Handling in Linux
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
Unit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B Kute
Unit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B KuteUnit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B Kute
Unit 6 Operating System TEIT Savitribai Phule Pune University by Tushar B Kute
 
Apache Pig for Data Scientists
Apache Pig for Data ScientistsApache Pig for Data Scientists
Apache Pig for Data Scientists
 
Part 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingPart 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module Programming
 
MIS 02 foundations of information systems
MIS 02  foundations of information systemsMIS 02  foundations of information systems
MIS 02 foundations of information systems
 
Mis 03 management information systems
Mis 03  management information systemsMis 03  management information systems
Mis 03 management information systems
 
MIS 04 Management of Business
MIS 04  Management of BusinessMIS 04  Management of Business
MIS 04 Management of Business
 
Introduction to linux ppt
Introduction to linux pptIntroduction to linux ppt
Introduction to linux ppt
 
Linux.ppt
Linux.ppt Linux.ppt
Linux.ppt
 
Study techniques of programming in c at kkwpss
Study techniques of programming in c at kkwpssStudy techniques of programming in c at kkwpss
Study techniques of programming in c at kkwpss
 
MIS 05 Decision Support Systems
MIS 05  Decision Support SystemsMIS 05  Decision Support Systems
MIS 05 Decision Support Systems
 
Apache Pig - JavaZone 2013
Apache Pig - JavaZone 2013Apache Pig - JavaZone 2013
Apache Pig - JavaZone 2013
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Open source applications softwares
Open source applications softwaresOpen source applications softwares
Open source applications softwares
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Module 1 introduction to Linux
Module 1 introduction to LinuxModule 1 introduction to Linux
Module 1 introduction to Linux
 
Module 02 Using Linux Command Shell
Module 02 Using Linux Command ShellModule 02 Using Linux Command Shell
Module 02 Using Linux Command Shell
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
 

Similar to Apache Pig: A big data processor

Apache pig
Apache pigApache pig
Apache pig
Sadiq Basha
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
ssuser92282c
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
vishal choudhary
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
 
Apache PIG
Apache PIGApache PIG
Apache PIG
Anuja Gunale
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramViswanath Gangavaram
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop framework
bhargavi804095
 
pig intro.pdf
pig intro.pdfpig intro.pdf
pig intro.pdf
ssuser2d043c
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
Aasim Naveed
 
Meet Hadoop Family: part 4
Meet Hadoop Family: part 4Meet Hadoop Family: part 4
Meet Hadoop Family: part 4
caizer_x
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
DrPDShebaKeziaMalarc
 
pig.ppt
pig.pptpig.ppt
pig.ppt
Sheba41
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
Jeyamariappan Guru
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
M4,C5 APACHE PIG.pptx
M4,C5 APACHE PIG.pptxM4,C5 APACHE PIG.pptx
M4,C5 APACHE PIG.pptx
Shrinivasa6
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
Sachin Vakkund
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Dong Ngoc
 

Similar to Apache Pig: A big data processor (20)

Apache pig
Apache pigApache pig
Apache pig
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop framework
 
pig intro.pdf
pig intro.pdfpig intro.pdf
pig intro.pdf
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Meet Hadoop Family: part 4
Meet Hadoop Family: part 4Meet Hadoop Family: part 4
Meet Hadoop Family: part 4
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
 
pig.ppt
pig.pptpig.ppt
pig.ppt
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
M4,C5 APACHE PIG.pptx
M4,C5 APACHE PIG.pptxM4,C5 APACHE PIG.pptx
M4,C5 APACHE PIG.pptx
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

More from Tushar B Kute

01 Introduction to Android
01 Introduction to Android01 Introduction to Android
01 Introduction to Android
Tushar B Kute
 
Ubuntu OS and it's Flavours
Ubuntu OS and it's FlavoursUbuntu OS and it's Flavours
Ubuntu OS and it's Flavours
Tushar B Kute
 
Install Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteInstall Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. Kute
Tushar B Kute
 
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteInstall Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Tushar B Kute
 
Share File easily between computers using sftp
Share File easily between computers using sftpShare File easily between computers using sftp
Share File easily between computers using sftp
Tushar B Kute
 
Implementation of FIFO in Linux
Implementation of FIFO in LinuxImplementation of FIFO in Linux
Implementation of FIFO in Linux
Tushar B Kute
 
Implementation of Pipe in Linux
Implementation of Pipe in LinuxImplementation of Pipe in Linux
Implementation of Pipe in Linux
Tushar B Kute
 
Basic Multithreading using Posix Threads
Basic Multithreading using Posix ThreadsBasic Multithreading using Posix Threads
Basic Multithreading using Posix Threads
Tushar B Kute
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
Tushar B Kute
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in Linux
Tushar B Kute
 
Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)
Tushar B Kute
 
Introduction to Ubuntu Edge Operating System (Ubuntu Touch)
Introduction to Ubuntu Edge Operating System (Ubuntu Touch)Introduction to Ubuntu Edge Operating System (Ubuntu Touch)
Introduction to Ubuntu Edge Operating System (Ubuntu Touch)
Tushar B Kute
 
Technical blog by Engineering Students of Sandip Foundation, itsitrc
Technical blog by Engineering Students of Sandip Foundation, itsitrcTechnical blog by Engineering Students of Sandip Foundation, itsitrc
Technical blog by Engineering Students of Sandip Foundation, itsitrc
Tushar B Kute
 
Chapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteChapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B Kute
Tushar B Kute
 
Chapter 02: Classes Objects and Methods Java by Tushar B Kute
Chapter 02: Classes Objects and Methods Java by Tushar B KuteChapter 02: Classes Objects and Methods Java by Tushar B Kute
Chapter 02: Classes Objects and Methods Java by Tushar B Kute
Tushar B Kute
 
Java Servlet Programming under Ubuntu Linux by Tushar B Kute
Java Servlet Programming under Ubuntu Linux by Tushar B KuteJava Servlet Programming under Ubuntu Linux by Tushar B Kute
Java Servlet Programming under Ubuntu Linux by Tushar B Kute
Tushar B Kute
 
Module 01 Introduction to Linux
Module 01 Introduction to LinuxModule 01 Introduction to Linux
Module 01 Introduction to Linux
Tushar B Kute
 
Module 03 Programming on Linux
Module 03 Programming on LinuxModule 03 Programming on Linux
Module 03 Programming on Linux
Tushar B Kute
 
See through C
See through CSee through C
See through C
Tushar B Kute
 
Module 05 Preprocessor and Macros in C
Module 05 Preprocessor and Macros in CModule 05 Preprocessor and Macros in C
Module 05 Preprocessor and Macros in C
Tushar B Kute
 

More from Tushar B Kute (20)

01 Introduction to Android
01 Introduction to Android01 Introduction to Android
01 Introduction to Android
 
Ubuntu OS and it's Flavours
Ubuntu OS and it's FlavoursUbuntu OS and it's Flavours
Ubuntu OS and it's Flavours
 
Install Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteInstall Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. Kute
 
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteInstall Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. Kute
 
Share File easily between computers using sftp
Share File easily between computers using sftpShare File easily between computers using sftp
Share File easily between computers using sftp
 
Implementation of FIFO in Linux
Implementation of FIFO in LinuxImplementation of FIFO in Linux
Implementation of FIFO in Linux
 
Implementation of Pipe in Linux
Implementation of Pipe in LinuxImplementation of Pipe in Linux
Implementation of Pipe in Linux
 
Basic Multithreading using Posix Threads
Basic Multithreading using Posix ThreadsBasic Multithreading using Posix Threads
Basic Multithreading using Posix Threads
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in Linux
 
Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)
 
Introduction to Ubuntu Edge Operating System (Ubuntu Touch)
Introduction to Ubuntu Edge Operating System (Ubuntu Touch)Introduction to Ubuntu Edge Operating System (Ubuntu Touch)
Introduction to Ubuntu Edge Operating System (Ubuntu Touch)
 
Technical blog by Engineering Students of Sandip Foundation, itsitrc
Technical blog by Engineering Students of Sandip Foundation, itsitrcTechnical blog by Engineering Students of Sandip Foundation, itsitrc
Technical blog by Engineering Students of Sandip Foundation, itsitrc
 
Chapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteChapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B Kute
 
Chapter 02: Classes Objects and Methods Java by Tushar B Kute
Chapter 02: Classes Objects and Methods Java by Tushar B KuteChapter 02: Classes Objects and Methods Java by Tushar B Kute
Chapter 02: Classes Objects and Methods Java by Tushar B Kute
 
Java Servlet Programming under Ubuntu Linux by Tushar B Kute
Java Servlet Programming under Ubuntu Linux by Tushar B KuteJava Servlet Programming under Ubuntu Linux by Tushar B Kute
Java Servlet Programming under Ubuntu Linux by Tushar B Kute
 
Module 01 Introduction to Linux
Module 01 Introduction to LinuxModule 01 Introduction to Linux
Module 01 Introduction to Linux
 
Module 03 Programming on Linux
Module 03 Programming on LinuxModule 03 Programming on Linux
Module 03 Programming on Linux
 
See through C
See through CSee through C
See through C
 
Module 05 Preprocessor and Macros in C
Module 05 Preprocessor and Macros in CModule 05 Preprocessor and Macros in C
Module 05 Preprocessor and Macros in C
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

Apache Pig: A big data processor

  • 1. PIG: A Big Data Processor Tushar B. Kute, http://tusharkute.com
  • 2. What is Pig? • Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. • Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig. • To write data analysis programs, Pig provides a high- level language known as Pig Latin. • This language provides various operators using which programmers can develop their own functions for reading, writing, and processing data.
  • 3. Apache Pig • To analyze data using Apache Pig, programmers need to write scripts using Pig Latin language. • All these scripts are internally converted to Map and Reduce tasks. • Apache Pig has a component known as Pig Engine that accepts the Pig Latin scripts as input and converts those scripts into MapReduce jobs.
  • 4. Why do we need Apache Pig? • Using Pig Latin, programmers can perform MapReduce tasks easily without having to type complex codes in Java. • Apache Pig uses multi-query approach, thereby reducing the length of codes. For example, an operation that would require you to type 200 lines of code (LoC) in Java can be easily done by typing as less as just 10 LoC in Apache Pig. Ultimately, Apache Pig reduces the development time by almost 16 times. • Pig Latin is SQL-like language and it is easy to learn Apache Pig when you are familiar with SQL. • Apache Pig provides many built-in operators to support data operations like joins, filters, ordering, etc. In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce.
  • 5. Features of Pig • Rich set of operators: It provides many operators to perform operations like join, sort, filer, etc. • Ease of programming: Pig Latin is similar to SQL and it is easy to write a Pig script if you are good at SQL. • Optimization opportunities: The tasks in Apache Pig optimize their execution automatically, so the programmers need to focus only on semantics of the language. • Extensibility: Using the existing operators, users can develop their own functions to read, process, and write data. • UDF’s: Pig provides the facility to create User-defined Functions in other programming languages such as Java and invoke or embed them in Pig Scripts. • Handles all kinds of data: Apache Pig analyzes all kinds of data, both structured as well as unstructured. It stores the results in HDFS.
  • 9. Applications of Apache Pig • To process huge data sources such as web logs. • To perform data processing for search platforms. • To process time sensitive data loads.
  • 10. Apache Pig – History • In 2006, Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset. • In 2007, Apache Pig was open sourced via Apache incubator. • In 2008, the first release of Apache Pig came out. In 2010, Apache Pig graduated as an Apache top-level project.
  • 12. Apache Pig – Components • Parser: Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking, and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators. • Optimizer: The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown. • Compiler: The compiler compiles the optimized logical plan into a series of MapReduce jobs. • Execution engine: Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.
  • 13. Apache Pig – Data Model
  • 14. Apache Pig – Elements • Atom – Any single value in Pig Latin, irrespective of their data, type is known as an Atom. – It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig. – A piece of data or a simple atomic value is known as a field. – Example: ‘raja’ or ‘30’
  • 15. Apache Pig – Elements • Tuple – A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS. – Example: (Raja, 30)
  • 16. Apache Pig – Elements • Bag – A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in th same position (column) have the same type. – Example: {(Raja, 30), (Mohammad, 45)} – A bag can be a field in a relation; in that context, it is known as inner bag. – Example: {Raja, 30, {9848022338, raja@gmail.com,}}
  • 17. Apache Pig – Elements • Relation – A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order). • Map – A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be unique. The value might be of any type. It is represented by ‘[]’ – Example: [name#Raja, age#30]
  • 19. Download • Download the tar.gz file of Apache Pig from here: http://mirror.fibergrid.in/apache/pig/pig-0.15.0/ pig-0.15.0.tar.gz
  • 20. Extract and copy • Extract this file using right-click -> 'Extract here' option or by tar -xzvf command. • Rename the created folder 'pig-0.15.0' to 'pig' • Now, move this folder to /usr/lib using following command: $ sudo mv pig/ /usr/lib
  • 21. Edit the bashrc file • Open the bashrc file: sudo gedit ~/.bashrc • Go to end of the file and add following lines. export PIG_HOME=/usr/lib/pig export PATH=$PATH:$PIG_HOME/bin • Type following command to make it in effect: source ~/.bashrc
  • 22. Start the Pig • Start the pig in local mode: pig -x local • Start the pig in mapreduce mode (needs hadoop datanode started): pig -x mapreduce
  • 26. Load data • $ pig -x local • grunt> movies = LOAD 'movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration) • grunt> dump movies; it displays the contents
  • 27. Filter data • grunt> movies_greater_than_35 = FILTER movies BY (float)rating > 3.5; • grunt> dump movies_greater_than_35;
  • 28. Store the results data • grunt> store movies_greater_than_35 into 'my_movies'; • It stores the result in local file system directory named 'my_movies'.
  • 29. Display the result • Now display the result from local file system. cat my_movies/part-m-00000
  • 30. Load command • The load command specified only the column names. We can modify the statement as follows to include the data type of the columns: • grunt> movies = LOAD  'movies_data.csv' USING  PigStorage(',') as (id:int,  name:chararray, year:int,  rating:double, duration:int);
  • 31. Check the filters • List the movies that were released between 1950 and 1960 grunt> movies_between_90_95 = FILTER  movies by year > 1990 and year < 1995; • List the movies that start with the Alpahbet D grunt> movies_starting_with_D = FILTER  movies by name matches 'D.*'; • List the movies that have duration greater that 2 hours grunt> movies_duration_2_hrs = FILTER  movies by duration > 7200; 
  • 32. Output Movies between 1990 to 1995 Movies starts W ith 'D' Movies greater Than 2 hours
  • 33. Describe • DESCRIBE The schema of a relation/alias can be viewed using the DESCRIBE command: grunt> DESCRIBE movies; movies: {id: int, name: chararray,  year: int, rating: double, duration:  int} 
  • 34. Foreach • FOREACH gives a simple way to apply transformations based on columns. Let’s understand this with an example. • List the movie names its duration in minutes grunt> movie_duration = FOREACH movies  GENERATE name, (double)(duration/60); • The above statement generates a new alias that has the list of movies and it duration in minutes. • You can check the results using the DUMP command.
  • 36. Group • The GROUP keyword is used to group fields in a relation. • List the years and the number of movies released each year. grunt> grouped_by_year = group movies  by year; grunt> count_by_year = FOREACH  grouped_by_year GENERATE group,  COUNT(movies);
  • 38. Order by • Let us question the data to illustrate the ORDER BY operation. • List all the movies in the ascending order of year. grunt> desc_movies_by_year = ORDER  movies BY year ASC; grunt> DUMP desc_movies_by_year; • List all the movies in the descending order of year. grunt> asc_movies_by_year = ORDER movies  by year DESC; grunt> DUMP asc_movies_by_year;
  • 39. Output- Ascending by year From 1985 To 2004
  • 40. Limit • Use the LIMIT keyword to get only a limited number for results from relation. grunt> top_5_movies = LIMIT movies 5; grunt> DUMP top_10_movies;
  • 41. Pig: Modes of Execution • Pig programs can be run in three methods which work in both local and MapReduce mode. They are – Script Mode – Grunt Mode – Embedded Mode
  • 42. Script mode • Script Mode or Batch Mode: In script mode, pig runs the commands specified in a script file. The following example shows how to run a pig programs from a script file: $ vim scriptfile.pig    A = LOAD 'script_file';    DUMP A; $ pig ­x local scriptfile.pig
  • 43. Grunt mode • Grunt Mode or Interactive Mode: The grunt mode can also be called as interactive mode. Grunt is pig's interactive shell. It is started when no file is specified for pig to run. $ pig ­x local grunt> A = LOAD 'grunt_file'; grunt> DUMP A; • You can also run pig scripts from grunt using run and exec commands. grunt> run scriptfile.pig grunt> exec scriptfile.pig
  • 44. Embedded mode • You can embed pig programs in Java, python and ruby and can run from the same.
  • 45. Example: Wordcount program • Q) How to find the number of occurrences of the words in a file using the pig script? • You can find the famous word count example written in map reduce programs in apache website. Here we will write a simple pig script for the word count problem. • The pig script given in next slide finds the number of times a word repeated in a file:
  • 46. Example: text file- shivneri.txt
  • 48. Output snapshot $ pig -x local forts.pig
  • 49. References • “Programming Pig” by Alan Gates, O'Reilly Publishers. • “Pig Design Patterns” by Pradeep Pasupuleti, PACKT Publishing • Tutorials Point • http://github.com/rohitdens • http://pig.apache.org
  • 50. tushar@tusharkute.com Thank you This presentation is created using LibreOffice Impress 4.2.8.2, can be used freely as per GNU General Public License Blogs http://digitallocha.blogspot.in http://kyamputar.blogspot.in Web Resources http://tusharkute.com