Successfully reported this slideshow.
Upcoming SlideShare
×

# Scope vs YSmart

490 views

Published on

Slides for a course.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Scope vs YSmart

1. 1. YSmart Revisited What is YSmart?  Yet Another SQL-to-MR Translator Why “yet another”?  Sentence-by-sentence translation fails! 2
2. 2. Example <exp1>, <exp2> are expensive data loading! <J1>, <J2>, <J3> are expensive computation!Wrong View Correct Viewa = 1; a = <exp1>;b = 2; b = <exp2>;x = a j1 b x = a J1 bc = 1; c = <exp1>;y = j2 c; y = J2 c;d = y; d = y;z = j3 d; z = J3 d; a J1 b c J2 d J3 3
3. 3. Big Data!We cannot afford redundancies anymore!Let’s eliminate redundancies  YSmart! 4
4. 4. Correlation-Aware SQL-to-MR Translator MR Jobs for bestSQL-like queries performance Merge Primitive Identify Correlated MR Jobs Correlations MR jobs 5
5. 5. Input Correlation (IC) Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint J1 J2 lineitem orders lineitem 6
6. 6. Transit Correlation (TC) Multiple MR jobs have transit correlation (TC) if  they have input correlation (IC), and  they have the same Partition Key Key: l_orderkey Key: l_orderkey J1 J2 lineitem orders lineitem 7
7. 7. Job Flow Correlation (JFC) A MR job has Job Flow Correlation (JFC) with one of its child MR jobs if it has the same partition key as that MR job J1 J2 Partition Key Output of MR Job 2 J2 Other Data Map Func. of MR Job 1 J1 Reduce Func. of MR Job 1 Map Func. of MR Job 2 lineitem orders Reduce Func. of MR Job 2 8
8. 8. Put it all together1: Sentence-to-Sentence Translation 2: InputCorrelation+TransitCorrelation• 5 MR jobs • 3 MR jobs Left-outer- Left- Join outer-Join Join2 Join2 Join1 AGG1 AGG2lineitem orders lineitem lineitem lineitem orders 3: InputCorrelation+TransitCorrelation+ JobFlowCorrelation 4: Hand-coding (similar with Case 3) • 1 MR job • In reduce function, we optimize code according query semantic lineitem orders lineitem orders 9
9. 9. YSmart vs SCOPE Big Data(Big) Data Processing Language Analytic Jobs Naïve Optimization Translation• YSmart: look at data dependence and control dependence • Identify three correlations • Merge jobs to eliminate redundancy (straightforward)• SCOPE: look at the actual structure of the input data • Identify structural property correlations • Partition, group, merge (complicated) 10
10. 10. Big Picture Big Data(Big) Data Processing Language Analytic Jobs Input Input Naïve Independent Dependent Translation Optimization Optimization Naïve YSmart SCOPE Translation 11
11. 11. YSmart vs SCOPE Big Data(Big) Data Processing Language Analytic Jobs Naïve Optimization Translation• The diagram is actually an over-abstraction.• In reality, • YSmart: source-to-source transformation • SCOPE: run-time optimizing compiler tightly coupled with underlying execution environment 12
12. 12. YSmart Alone Big Data(Big) Data Processing Language Analytic Jobs Naïve YSmart Translation• 3x speedup, but 17% slower than human• It is supposed to be smarter than human!• What went wrong: • Bad input code • Not enough optimization 14
13. 13. SCOPE Alone Big Data(Big) Data Processing Language Analytic Jobs Naïve SCOPE Translation• No thorough evaluation; 2x speedup on a specific case• Problem • They are looking at structures, but at a wrong level. • Very likely, they are optimizing computations that are not strictly necessary! 15
14. 14. Discussions1. Is SQL good enough as a big data analytics processing language? • Bad language design can be detrimental • Redundancies could be introduced unnecessarily simply due to poor expressiveness of the language2. How to migrate traditional program analysis and compiler optimization to the big data era? • Correlation detection in YSmart is inherently similar to dependency analysis. • In compiler optimization, we focus on def-use statements and expressions; in big data, we should focus on big data transfer and big data tables. 16
15. 15. Fundamentally, we need good programming languages & program analyses for big data analytics! 17