Pig Map Reduce Execution

  • 2,333 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,333
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
27
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Pig’s Map Reduce Execution
    xiafei.qiu@PCA
  • 2. Agenda
    Data type
    Data structure
    Pig-Latin to Map-Reduce job compilation
    Physical Plan Execution
    UDF Invocation
  • 3. Data Type
    Tuple
    An ordered list of Data.
    DefaultTuple has List<Object> mFields
    DataBag
    A collection of Tuples.
    Memory Manager calls spill() to spill to disk
    Map – Java Type
    Integer, Double, etc.. – Java Type
  • 4. Data Structure
  • 5. Map-Reduce Compilation
    Pig-Latin to Logical Plan
    Parser invoke logicalPlanBuilder
    Logical Plan to Physical Plan
    LogToPhyTranslationVisitor
    group, distinct:LR-GR-Pack
    Join: LR-GR-JoinPack(with inner foreach)
  • 6. Map-Reduce Compilation
    Physical Plan to Map-Reduce Plan
    A MROperator stands for a MR job
    Traverse in topological order
    If POLoad or GlobalRearrnge, new MR operator/job
  • 7. Map-Reduce Compilation
  • 8. Map-Reduce Compilation
  • 9. Map Execution
    protectedvoid map(Text key, Tuple inpTuple, Context context) throws IOException, InterruptedException
    {
    //...........
    for (PhysicalOperator root : roots) {
    if (inIllustrator) {
    if (root != null) {
    root.attachInput(inpTuple);
    }
    } else {
    root.attachInput(tf.newTupleNoCopy(inpTuple.getAll()));
    }
    }
    runPipeline(leaf);
    }
  • 10. Map Execution
    protectedvoid runPipeline(PhysicalOperator leaf) throws IOException, InterruptedException {
    while(true){
    Result res = leaf.getNext(DUMMYTUPLE);
    if(res.returnStatus==POStatus.STATUS_OK){
    collect(outputCollector,(Tuple)res.result);
    continue;
    }
    }
    //...........
    }
  • 11. Reduce Execution
    protectedvoid reduce(PigNullableWritable key, Iterable<NullableTuple> tupIter, Context context) throws IOException, InterruptedException
    {
    //...........
    if (packinstanceofPOJoinPackage)
    {
    pack.attachInput(key, tupIter.iterator());
    while (true)
    {
    if (processOnePackageOutput(context))
    break;
    }
    }
    else
    {
    pack.attachInput(key, tupIter.iterator());
    processOnePackageOutput(context);
    }
    }
  • 12. Reduce Execution
    publicbooleanprocessOnePackageOutput(Context oc)
    throws IOException, InterruptedException
    {
    Result res = pack.getNext(DUMMYTUPLE);
    if(res.returnStatus==POStatus.STATUS_OK)
    {
    Tuple packRes = (Tuple)res.result;
    //...........
    for (int i = 0; i < roots.length; i++) {
    roots[i].attachInput(packRes);
    }
    runPipeline(leaf);
    }
    if(res.returnStatus==POStatus.STATUS_NULL) {
    returnfalse;
    }
    //...........
    if(res.returnStatus==POStatus.STATUS_EOP) {
    returntrue;
    }
    returnfalse;
    }
  • 13. Physical Plan Execution
    PhysicalPlan extends OperatorPlan<PhysicalOperator>
    Operation on Graph
    PhysicalOperator as vertex
    Each vertex has a group of getNext() methods
    processInput() if necessary
  • 14. Physical Plan Execution
    public Result getNext(Tuple t) throwsExecException
    {
    //...........
    Result res = new Result();
    try {
    res.result = loader.getNext();
    if(res.result==null){
    res.returnStatus = POStatus.STATUS_EOP;
    tearDown();
    }
    else
    res.returnStatus = POStatus.STATUS_OK;
    if (res.returnStatus == POStatus.STATUS_OK)
    res.result = illustratorMarkup(res, res.result, 0);
    } catch (IOException e) {
    log.error("Received error from loader function: " + e);
    return res;
    }
    return res;
    }
  • 15. Physical Plan Execution
    public Result getNext(Tuple t) throwsExecException {
    Result res = null;
    Result inp = null;
    while (true) {
    inp = processInput();
    if (inp.returnStatus == POStatus.STATUS_EOP
    || inp.returnStatus == POStatus.STATUS_ERR)
    break;
    illustratorMarkup(inp.result, null, 0);
    // illustrator ignore LIMIT before the post processing
    if ((illustrator == null || illustrator.getOriginalLimit() != -1) && soFar>=mLimit)
    inp.returnStatus = POStatus.STATUS_EOP;
    soFar++;
    break;
    }
    returninp;
    }
  • 16. UDF/Built-In Invocation
    POUserFunc