Successfully reported this slideshow.
YSmart Revisited What is YSmart?   Yet Another SQL-to-MR Translator Why “yet another”?   Sentence-by-sentence translat...
Example            <exp1>, <exp2> are expensive data loading!            <J1>, <J2>, <J3> are expensive computation!Wrong ...
Big Data!We cannot afford redundancies anymore!Let’s eliminate redundancies  YSmart!                                     4
Correlation-Aware SQL-to-MR Translator                                          MR Jobs for bestSQL-like queries          ...
Input Correlation (IC) Multiple MR jobs have input correlation (IC) if their  input relation sets are not disjoint       ...
Transit Correlation (TC) Multiple MR jobs have transit correlation (TC) if   they have input correlation (IC), and   th...
Job Flow Correlation (JFC) A MR job has Job Flow Correlation (JFC) with one of its child  MR jobs if it has the same part...
Put it all together1: Sentence-to-Sentence Translation                        2: InputCorrelation+TransitCorrelation• 5 MR...
YSmart vs SCOPE                                                Big Data(Big) Data Processing Language                  Ana...
Big Picture                                            Big Data(Big) Data Processing Language              Analytic Jobs  ...
YSmart vs SCOPE                                                Big Data(Big) Data Processing Language                  Ana...
YSmart Alone                                             Big Data(Big) Data Processing Language               Analytic Job...
SCOPE Alone                                                 Big Data(Big) Data Processing Language                   Analy...
Discussions1. Is SQL good enough as a big data analytics processing   language?   • Bad language design can be detrimental...
Fundamentally, we need good  programming languages              &     program analyses             for     big data analyt...
Scope vs YSmart
Scope vs YSmart
Upcoming SlideShare
Loading in …5
×

Scope vs YSmart

490 views

Published on

Slides for a course.

  • Be the first to comment

Scope vs YSmart

  1. 1. YSmart Revisited What is YSmart?  Yet Another SQL-to-MR Translator Why “yet another”?  Sentence-by-sentence translation fails! 2
  2. 2. Example <exp1>, <exp2> are expensive data loading! <J1>, <J2>, <J3> are expensive computation!Wrong View Correct Viewa = 1; a = <exp1>;b = 2; b = <exp2>;x = a j1 b x = a J1 bc = 1; c = <exp1>;y = j2 c; y = J2 c;d = y; d = y;z = j3 d; z = J3 d; a J1 b c J2 d J3 3
  3. 3. Big Data!We cannot afford redundancies anymore!Let’s eliminate redundancies  YSmart! 4
  4. 4. Correlation-Aware SQL-to-MR Translator MR Jobs for bestSQL-like queries performance Merge Primitive Identify Correlated MR Jobs Correlations MR jobs 5
  5. 5. Input Correlation (IC) Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint J1 J2 lineitem orders lineitem 6
  6. 6. Transit Correlation (TC) Multiple MR jobs have transit correlation (TC) if  they have input correlation (IC), and  they have the same Partition Key Key: l_orderkey Key: l_orderkey J1 J2 lineitem orders lineitem 7
  7. 7. Job Flow Correlation (JFC) A MR job has Job Flow Correlation (JFC) with one of its child MR jobs if it has the same partition key as that MR job J1 J2 Partition Key Output of MR Job 2 J2 Other Data Map Func. of MR Job 1 J1 Reduce Func. of MR Job 1 Map Func. of MR Job 2 lineitem orders Reduce Func. of MR Job 2 8
  8. 8. Put it all together1: Sentence-to-Sentence Translation 2: InputCorrelation+TransitCorrelation• 5 MR jobs • 3 MR jobs Left-outer- Left- Join outer-Join Join2 Join2 Join1 AGG1 AGG2lineitem orders lineitem lineitem lineitem orders 3: InputCorrelation+TransitCorrelation+ JobFlowCorrelation 4: Hand-coding (similar with Case 3) • 1 MR job • In reduce function, we optimize code according query semantic lineitem orders lineitem orders 9
  9. 9. YSmart vs SCOPE Big Data(Big) Data Processing Language Analytic Jobs Naïve Optimization Translation• YSmart: look at data dependence and control dependence • Identify three correlations • Merge jobs to eliminate redundancy (straightforward)• SCOPE: look at the actual structure of the input data • Identify structural property correlations • Partition, group, merge (complicated) 10
  10. 10. Big Picture Big Data(Big) Data Processing Language Analytic Jobs Input Input Naïve Independent Dependent Translation Optimization Optimization Naïve YSmart SCOPE Translation 11
  11. 11. YSmart vs SCOPE Big Data(Big) Data Processing Language Analytic Jobs Naïve Optimization Translation• The diagram is actually an over-abstraction.• In reality, • YSmart: source-to-source transformation • SCOPE: run-time optimizing compiler tightly coupled with underlying execution environment 12
  12. 12. YSmart Alone Big Data(Big) Data Processing Language Analytic Jobs Naïve YSmart Translation• 3x speedup, but 17% slower than human• It is supposed to be smarter than human!• What went wrong: • Bad input code • Not enough optimization 14
  13. 13. SCOPE Alone Big Data(Big) Data Processing Language Analytic Jobs Naïve SCOPE Translation• No thorough evaluation; 2x speedup on a specific case• Problem • They are looking at structures, but at a wrong level. • Very likely, they are optimizing computations that are not strictly necessary! 15
  14. 14. Discussions1. Is SQL good enough as a big data analytics processing language? • Bad language design can be detrimental • Redundancies could be introduced unnecessarily simply due to poor expressiveness of the language2. How to migrate traditional program analysis and compiler optimization to the big data era? • Correlation detection in YSmart is inherently similar to dependency analysis. • In compiler optimization, we focus on def-use statements and expressions; in big data, we should focus on big data transfer and big data tables. 16
  15. 15. Fundamentally, we need good programming languages & program analyses for big data analytics! 17

×