This PPT is related to Apache Pig in Big Data. which is covered in its key features, anatomy, Data types, running and execution mode of pig and when to use pig, and when not to use a pig.
2. What is Pig ?
Scripting language
Alternative to MapReduce Programming
Developed as a reserach project at Yahoo in 2006
Platform for data analysis
3. Key features of Pig
• It provides an engine for executing data flows.
• It provides a language called "Pig Latin".
• It contains operators for many of the traditional data operations.
• UDF
5. Pig on Hadoop
HDFS
commands
UNIX shell
commands
Relational
operators
Positional
parameters
Common
mathematical
function
Custom
function
Complex data
structure
8. Pig Latin
Overview
Pig Latin Statements
Ex: A = load ‘students’ (rollno, name);
Pig Latin : Keywords
Pig Latin : Identifiers
Pig Latin : Comments
Pig Latin : Case Sensitivity
Operators in Pig Latin
9. Data Types in Pig
Simple data types
Int
Long
Float
Double
Chararray
Bytearray
Datetime
16. Piggy Bank
Users can share their functions in piggy bank
Need to use the “register” keyword to use piggy bank jar function in pig script
Ex:
• register ‘/root/pigdemos/piggybank-0.12.0.jar’;
• A= load ‘/pigdemo/student.tsv’ as (rollno:int, name:chararray, gpa:float);
• Upper= foreach A generate
• org,.apache.pig.piggybank.evaluation.string.UPPER(name);
• DUMP upper;
17. When to use pig ?
When your data loads are time sensitive.
When you want to process various data sources.
When you want to get analytical insights through
sampling.
18. When not to use pig ?
When your data is
completely in the
unstructured
form such as
video, text, and
audio.
When there is a
time constraint
because pig is
slower than
MapReduce jobs.