Introduction to pig
Upcoming SlideShare
Loading in...5
×
 

Introduction to pig

on

  • 1,442 views

 

Statistics

Views

Total Views
1,442
Views on SlideShare
850
Embed Views
592

Actions

Likes
3
Downloads
26
Comments
0

5 Embeds 592

http://chunyemen.org 586
http://cache.baidu.com 2
http://feed.feedsky.com 2
http://webcache.googleusercontent.com 1
http://ershu.me 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction to pig Introduction to pig Presentation Transcript

  • Introduction to Pig
    Xiafei.qiu@PCA
  • Nested Data Model
    Field, Tuple, Bag, Map
  • Normal Operators
    Arithmetic Operators
    X = FOREACH A GENERATE f1, f2, f1 % f2;
    Boolean Operators
    X = FILTER A BY (f1 == 8) OR (NOT (f2+f3 > f1));
    Cast operators
    X = FOREACH B GENERATE group, (chararray) COUNT(A) AS total;
    Comparison Operators
    X = FILTER A BY (f1 matches '.*apache.*') OR (NOT (f2+f3 > f1));
    Flatten Operator
    Tuple: remove a level of nesting
    Bag :remove a level of nesting, may cause cross product 
  • Normal Operators
  • Relational Operators
    LOADa bag of tuples
    A = LOAD 'data' [USING function] [AS schema]; 
    STORE
    A = STORE alias INTO 'directory' [USING function];
    FOREACHtuple in the bag, produce a new tuple
    A = FOREACH queries GENERATE uid, expandQuery(query);
    FILTERa bag to produce a subset of it
    A = FILTER queries BY uidneq ‘bot’ OR notBot(uid);
  • Relational Operators
    COGROUP/GROUPone or less than 127 relations
    alias = GROUP … by …, … by…
    {group: int, A: {name: chararray,age: int,gpa: float}}
    (18,{(John,18,4.0F),(Joe,18,3.8F)})
  • Relational Operators
    JOIN(inner/outer)
    Replicated Joins
    one or more relations are small enough to fit into main memory. 
    Skewed Joins
    computes a histogram of the key space and uses this data to allocate reducers for a given key. 
    Merge Joins
    Sorted, perform join on map phase
  • Relational Operators
  • Relational Operators
    ORDERalias by filed DESC/ASC
    Unstable
    SPLITalias INTO alias IF …, alias IF …
    CROSS
    cross product
    X = CROSS A, B;
    DISTINCT
    Removes duplicate tuples in a relation.
    X = DISTINCT A;
    LIMIT
    LIMITE A 3;
    SAMPLE
    SAMPLE alias size;
    IMPORT
    Import other .pig file
    DEFINE
    Define a Pig macro.
  • Built In Eval Function
    AVG/MAX/MIN/SUM
    on a single column of a bag; group it first
    COUNT/ COUNT_STAR
    number of elements in a bag; COUNT_STAR counts null
    CONCAT
    DIFF
    IsEmpty
    SIZE
    TOKENIZE
  • Other Built In Function
    Load/Store Functions
    Math Functions
    String Functions
  • Map-Reduce Plan Compilation
    Compile each GROUP into distinct Map-Reduce job
    Push commands between LOAD and GROUP to the Map Side
    Commands between subsequent GROUP Gi and Gi+1 pushed into the Reduce Side of Gi
  • Map-Reduce Plan Compilation
    ORDER is compiled into two map-reduce jobs.
    MR1: sample the key space
    MR2: sort
  • User Defined Function
    Simple Eval Function
    public class UPPER extends EvalFunc<String>{  public String exec(Tuple input) throws IOException {     // .......  }}
  • User Defined Function
    Aggregate Functions
    Algebraic Interface
    they can be computed incrementally in a distributed fashion.
    Accumulator Interface
    designed to decrease memory usage
  • Accumulator Interface
    public interface Accumulator <T> {    public void accumulate(Tuple b) throws IOException;      public T getValue();      public void cleanup();}
  • Aggregate Functions
    public interface Algebraic{        public String getInitial();        public String getIntermed();        public String getFinal();}