Pig’s Map Reduce Execution<br />xiafei.qiu@PCA<br />
Agenda<br />Data type<br />Data structure<br />Pig-Latin to Map-Reduce job compilation<br />Physical Plan Execution<br />U...
Data Type<br />Tuple<br />An ordered list of Data.<br />DefaultTuple has List<Object> mFields<br />DataBag<br />A collecti...
Data Structure<br />
Map-Reduce Compilation<br />Pig-Latin to Logical Plan<br />Parser invoke logicalPlanBuilder<br />Logical Plan to Physical ...
Map-Reduce Compilation<br />Physical Plan to Map-Reduce Plan<br />A MROperator stands for a MR job<br />Traverse in topolo...
Map-Reduce Compilation<br />
Map-Reduce Compilation<br />
Map Execution<br />protectedvoid map(Text key, Tuple inpTuple, Context context) throws IOException, InterruptedException <...
Map Execution<br />protectedvoid runPipeline(PhysicalOperator leaf) throws IOException, InterruptedException {<br />while(...
Reduce Execution<br />protectedvoid reduce(PigNullableWritable key, Iterable<NullableTuple> tupIter, Context context)  thr...
Reduce Execution<br />publicbooleanprocessOnePackageOutput(Context oc) <br />throws IOException, InterruptedException <br ...
Physical Plan Execution<br />PhysicalPlan extends OperatorPlan<PhysicalOperator><br />Operation on Graph<br />PhysicalOper...
Physical Plan Execution<br />	public Result getNext(Tuple t) throwsExecException<br />{<br />//...........<br />	     Resu...
Physical Plan Execution<br />public Result getNext(Tuple t) throwsExecException {<br />        Result res = null;<br />   ...
UDF/Built-In Invocation<br />POUserFunc<br />
Upcoming SlideShare
Loading in …5
×

Pig Map Reduce Execution

3,057 views

Published on

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,057
On SlideShare
0
From Embeds
0
Number of Embeds
751
Actions
Shares
0
Downloads
37
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Pig Map Reduce Execution

  1. 1. Pig’s Map Reduce Execution<br />xiafei.qiu@PCA<br />
  2. 2. Agenda<br />Data type<br />Data structure<br />Pig-Latin to Map-Reduce job compilation<br />Physical Plan Execution<br />UDF Invocation<br />
  3. 3. Data Type<br />Tuple<br />An ordered list of Data.<br />DefaultTuple has List<Object> mFields<br />DataBag<br />A collection of Tuples.<br />Memory Manager calls spill() to spill to disk<br />Map – Java Type<br />Integer, Double, etc.. – Java Type<br />
  4. 4. Data Structure<br />
  5. 5. Map-Reduce Compilation<br />Pig-Latin to Logical Plan<br />Parser invoke logicalPlanBuilder<br />Logical Plan to Physical Plan<br />LogToPhyTranslationVisitor <br />group, distinct:LR-GR-Pack<br />Join: LR-GR-JoinPack(with inner foreach)<br />
  6. 6. Map-Reduce Compilation<br />Physical Plan to Map-Reduce Plan<br />A MROperator stands for a MR job<br />Traverse in topological order<br />If POLoad or GlobalRearrnge, new MR operator/job<br />
  7. 7. Map-Reduce Compilation<br />
  8. 8. Map-Reduce Compilation<br />
  9. 9. Map Execution<br />protectedvoid map(Text key, Tuple inpTuple, Context context) throws IOException, InterruptedException <br />{ <br />//...........<br />for (PhysicalOperator root : roots) {<br />if (inIllustrator) {<br />if (root != null) {<br /> root.attachInput(inpTuple);<br />}<br /> } else {<br /> root.attachInput(tf.newTupleNoCopy(inpTuple.getAll()));<br />}<br />}<br /> runPipeline(leaf);<br />}<br />
  10. 10. Map Execution<br />protectedvoid runPipeline(PhysicalOperator leaf) throws IOException, InterruptedException {<br />while(true){<br /> Result res = leaf.getNext(DUMMYTUPLE);<br />if(res.returnStatus==POStatus.STATUS_OK){<br /> collect(outputCollector,(Tuple)res.result);<br />continue;<br />}<br />}<br />//...........<br />}<br />
  11. 11. Reduce Execution<br />protectedvoid reduce(PigNullableWritable key, Iterable<NullableTuple> tupIter, Context context) throws IOException, InterruptedException <br />{<br />//...........<br />if (packinstanceofPOJoinPackage)<br />{<br />pack.attachInput(key, tupIter.iterator());<br /> while (true)<br /> {<br /> if (processOnePackageOutput(context))<br /> break;<br />}<br /> }<br /> else<br />{<br />pack.attachInput(key, tupIter.iterator());<br />processOnePackageOutput(context);<br />} <br />}<br />
  12. 12. Reduce Execution<br />publicbooleanprocessOnePackageOutput(Context oc) <br />throws IOException, InterruptedException <br />{<br /> Result res = pack.getNext(DUMMYTUPLE);<br />if(res.returnStatus==POStatus.STATUS_OK)<br />{<br /> Tuple packRes = (Tuple)res.result;<br /> //...........<br />for (int i = 0; i < roots.length; i++) {<br />roots[i].attachInput(packRes);<br />}<br />runPipeline(leaf);<br /> }<br />if(res.returnStatus==POStatus.STATUS_NULL) {<br />returnfalse;<br />}<br /> //...........<br />if(res.returnStatus==POStatus.STATUS_EOP) {<br />returntrue;<br />}<br /> returnfalse;<br />}<br />
  13. 13. Physical Plan Execution<br />PhysicalPlan extends OperatorPlan<PhysicalOperator><br />Operation on Graph<br />PhysicalOperator as vertex<br />Each vertex has a group of getNext() methods<br />processInput() if necessary<br />
  14. 14. Physical Plan Execution<br /> public Result getNext(Tuple t) throwsExecException<br />{<br />//...........<br /> Result res = new Result();<br />try {<br />res.result = loader.getNext();<br />if(res.result==null){<br />res.returnStatus = POStatus.STATUS_EOP;<br />tearDown();<br />}<br />else<br />res.returnStatus = POStatus.STATUS_OK;<br />if (res.returnStatus == POStatus.STATUS_OK)<br />res.result = illustratorMarkup(res, res.result, 0);<br /> } catch (IOException e) {<br />log.error("Received error from loader function: " + e);<br />return res;<br />}<br />return res;<br />}<br />
  15. 15. Physical Plan Execution<br />public Result getNext(Tuple t) throwsExecException {<br /> Result res = null;<br /> Result inp = null;<br />while (true) {<br />inp = processInput();<br />if (inp.returnStatus == POStatus.STATUS_EOP<br /> || inp.returnStatus == POStatus.STATUS_ERR)<br />break;<br />illustratorMarkup(inp.result, null, 0);<br />// illustrator ignore LIMIT before the post processing<br />if ((illustrator == null || illustrator.getOriginalLimit() != -1) && soFar>=mLimit)<br />inp.returnStatus = POStatus.STATUS_EOP;<br />soFar++;<br />break;<br />}<br />returninp;<br />}<br />
  16. 16. UDF/Built-In Invocation<br />POUserFunc<br />

×