Mixing Sour ce and Bytecode
    A Case for Compilation by Normalization




OOPSLA'08, Nashville, Tennessee

Lennart Kats, TU Delft
Martin Bravenboer, UMass
Eelco Visser, TU Delft
October 21, 2008



Software Engineering Research Group
Language
Extensions
DSLs and Language Extensions

Domain-specific         void browse() {
                          List<Book> books =
• Database queries
                           <| SELECT *
• Regular expressions          FROM books
• XML processing               WHERE price < 100.00
                             |>;
• Matrices
• Complex numbers           …
• ...
                            for (int i = 0; i < books.size(); i++)
                               books.get(i).title =~ s/^The //;
                        }
DSLs and Language Extensions

Domain-specific         General-purpose
• Database queries      • AOP
• Regular expressions   • Traits
• XML processing        • Enums
• Matrices              • Iterators
• Complex numbers       • Case classes
• ...                   • Properties/accessors
                        • ...
Example: The Foreach Statement
                                            Program B: Foo.java
Program A: Foo.mylang                    public class Foo {
                                           void bar(BarList list) {
public class Foo {                           Iterator it = list.iterator();
  void bar(BarList list) {                   while (it.hasNext()) {
    foreach (Bar b in list) {                   Bar b = it.next();
       b.print();                               b.print();
    }                                        }
                            Code generator
  }                                        }
}                                        }




                                                          javac
                                                Program C: Foo.class

                                                (bytecode)
The Preprocessor
   Extended
   language
                             + loosely coupled
                             + small, modular extension
 Preprocessor
                             + no need to know compiler
                               internals
 Host language               – no parser
                             – semantic analysis, errors?
                 INTERFACE
  Compiler


  Bytecode
The Front-end Extensible Compiler
   Extended                  + front-end reuse
   language
                             – tied to the front-end
   Front-end     Extension     internals


 Host language


                 INTERFACE
  Compiler
                                Polyglot           MetaBorg
  Bytecode
                                           ableJ
                                                       OJ
Traits: Composable Units of Behavior
                                      [Schärli, Ducasse, Nierstrasz, Black 2003]



public trait TDrawing {
  void draw() {
    …
  }

    require Vertices getVertices();                    public class Shape {
}                                                        void draw() { … }
                                            Traits
                                                           Vertices getVertices() { … }
public class Shape with TDrawing {                     }
  …

    Vertices getVertices() { … }
}
Foo.java            TBar.java
Traits
 •   Simple idea




                                          Traits
 •   Most libraries are
     distributed as compiled
     code

 → Why not do this for traits?    FooBar.java




                                          javac
                Not extensible


                                  FooBar.class
The
Fully
Extensible
Compiler?
Extended
language


 Parsing


 Name
Parsing
Frontend
                       The
analysis
                       Fully
  Type
 Parsing               Extensible
 analysis
                       Compiler?
  Code
 Parsing
generation
             + reuse everything
Backend
Optimize     + access to back-end
             –– tied to all compiler internals
 Parsing
  Emit       – complexity
             – performance trade-offs
Bytecode
The
          White
          Box
          Model?



–– tied to all compiler internals
– complexity
– performance trade-offs
The Normalizing Compiler
     Extended
     language
                                + simple interface
     Extension
     processor                  + access to back-end
                                ++ not tied to
  Source+bytecode                  compiler internals

                    INTERFACE
Normalizing compiler


     Bytecode
   core language
Foo.java            TBar.class




                               Traits
                        FooBar.java




Traits, Revisited      FooBar.class
Java, and Bytecode, and Backticks,
Oh, My!
class FooBar {
                        Java (Foo.java)
    void hello1() {
      System.out.println(“Hello Java world”);
    }

                                          Bytecode (TBar.class)
    `hello2 (void) [
      getstatic java.lang.System.out : java.io.PrintStream;
      ldc “Hello bytecode world”;
      invokevirtual java.io.PrintStream.println(java.lang.String : void);
    ]

}
Java, and Bytecode, and Backticks:
Fine-grained Mixing
class FooBar {

    void hello3() {
      System.out.println(`ldc "Hello mixed world");
    }

                                                                Typechecker
    `hello4 (void) [
      getstatic System.out : java.io.PrintStream;                             P
      push `"Hello ” + “Java".concat("bytecode world");                      SP
      invokevirtual java.io.PrintStream.println (java.lang.String : void);
    ]

}
Typical Applications

              Class-method / Statement / Expression level

Traits                  X

Partial/open classes    X

Iterator generators              X

Finite State Automata            X

Expression-based                          X
extensions
Example: Finite State Automata (DFAs)

Grammar




          States and transitions
                                    Code
Finite State Automata: Generated Java code
    while (true) {
     switch (state) {
         case INSIDE_QUOTE:
           switch (nextToken()) {
               case '"':
                 consume();
                 state = OUTSIDE_QUOTE; // end quote found!
                 break;
               …
               case EOF:
                  return;
               default:
                  consume();
             }
             break;
         case OUTSIDE_QUOTE:
           switch (c) {            + simple Java code
               …                   – performance trade-off
           }
           break;
       }
Finite State Automata: Inline Bytecode
    InsideQuote:
       switch (nextToken()) {
         case '"':
           consume();
           `goto OutsideQuote;
         case EOF:
           return;
         default:
           consume();
           `goto InsideQuote;
         }
       }
    OutsideQuote:
       switch (c) {           + performance
         …                    + low-level when
                                           necessary
       }
                           + minimal changes
                           + maintainable
Method-body Extensions

• Operators: x as T, x ?? y, ...

• Embedded (external) DSLs:
  Database queries, XML, ...

• Closures             No
                          r   m
                               ali
                                     ze

                                          Lower-level
                                             code
Stratego Normalization Rules

desugar-foreach:
  |[ foreach (t x in e) stm ]|
   
  |[ Iterator x_iterator = e.iterator();
     while (x_iterator.hasNext()) {
        t x = x_iterator.next();
        stm;
     }
  ]|
  with x_iterator := <newname> “iterator”
Normalizing ??

String property = readProperty(“x”) ?? “default”;
System.out.println(property);


// In basic Java:

String temp = readProperty(“x”);
if (temp == null) temp = “default”;
String property = temp;
System.out.println(property);
                              Context-sensitive
                               transformation
Normalizing ??

“But placing statements in front of the
 assimilated expressions is easy, isn’t it?”


Requires syntactic knowledge:
   • for, while, return, throw, …
   • other expressions
   • other extensions
Normalizing ??

“But placing statements in front of the
 assimilated expressions is easy, isn’t it?
 … Right?”

Also requires semantic knowledge:

Iterator<String> it = ...;



for (String e = e0; it.hasNext(); e = it.next() ?? “default”) {
  ...
}
Normalizing ??

“But placing statements in front of the
 assimilated expressions is easy, isn’t it?
 … Right?”

Also requires semantic knowledge:

Iterator<String> it = ...;

Integer temp = ...;
for (String e = e0; it.hasNext(); e = temp) {
   ...
}
Normalizing '??' !!

String property = readProperty(“x”) ?? “default”;
System.out.println(property);
Normalizing '??' !! - with Inline Bytecode

String property = `[
   push `readProperty(“x”);
   dup;
   store temp;
   ifnull else;
      push `e2;
      goto end;
   else:
      load temp;
   end:
                       S
];
System.out.println(property);
Normalizing '??' !! - with Inline Bytecode

String property = `[
   `String temp = readProperty(“x”);
   `if (temp == null) temp = “default”;
   push `temp                             S
];
System.out.println(property);
Normalizing '??' !! - with an EBlock

String property = {|
   String temp = readProperty(“x”);
   if (temp == null) temp = “default”;
| temp
|};
System.out.println(property);
desugar-coalescing:
 |[ e1 ?? e2 ]| 
 |[{|
      String x_temp = e1;
      if (x_temp == null) x_temp = e2;
   | x_temp
   |}
 ]|
 with x_temp := <newname> “temp”


                      '??' in a Normalization Rule
desugar-coalescing:
 |[ e1 ?? e2 ]| 
 |[{|
      String x_temp = e1;
      if (x_temp == null) x_temp = e2;
   | x_temp
   |}                Run-time:
                      Exception in thread "main"
 ]|                     java.lang.NullPointerException
                        …
 with x_temp :=   <newname> “temp”
                        at Generated.getVertices(Generated.java:10)


                              Position Information In
                                     Generated Code
desugar-coalescing:
 |[ e1 ?? e2 ]| 
 |[ trace (e1 ?? e2 @ <File>:<Line>:<Column>) {{|
      String x_temp = e1;
      if (x_temp == null) x_temp = e2;
   | x_temp
   |}}
 ]|
 with x_temp := <newname> “temp”

 Run-time:
        Maintaining Position Information With
 Exception in thread "main"
   java.lang.NullPointerException
   …                           Source Tracing
   at Shape.getVertices(Shape.java:10)
Reflection
             Class-method / Statement / Expression level

Traits                  X

Partial/open classes    X

Iterator generators              X

Finite State Automata            X

Expression-based                          X
extensions

Aspect weaving          X        X        X
Concluding Remarks
 •   Extension effort must be proportional to the benefits
 •   Black Boxes vs. White Boxes
 •   Don't throw away your primitives
 •   Normalization rules



Mixing Source and Bytecode. A Case for Compilation by Normalization.
Lennart C. L. Kats, Martin Bravenboer, and Eelco Visser. In OOPSLA'08.
http://www.program­transformation.org/Stratego/TheDryadCompiler
http://www.strategoxt.org

Mixing Source and Bytecode: A Case for Compilation By Normalization (OOPSLA 2008)

  • 1.
    Mixing Sour ceand Bytecode A Case for Compilation by Normalization OOPSLA'08, Nashville, Tennessee Lennart Kats, TU Delft Martin Bravenboer, UMass Eelco Visser, TU Delft October 21, 2008 Software Engineering Research Group
  • 2.
  • 3.
    DSLs and LanguageExtensions Domain-specific void browse() { List<Book> books = • Database queries <| SELECT * • Regular expressions FROM books • XML processing WHERE price < 100.00 |>; • Matrices • Complex numbers … • ... for (int i = 0; i < books.size(); i++) books.get(i).title =~ s/^The //; }
  • 4.
    DSLs and LanguageExtensions Domain-specific General-purpose • Database queries • AOP • Regular expressions • Traits • XML processing • Enums • Matrices • Iterators • Complex numbers • Case classes • ... • Properties/accessors • ...
  • 5.
    Example: The ForeachStatement Program B: Foo.java Program A: Foo.mylang public class Foo { void bar(BarList list) { public class Foo { Iterator it = list.iterator(); void bar(BarList list) { while (it.hasNext()) { foreach (Bar b in list) { Bar b = it.next(); b.print(); b.print(); } } Code generator } } } } javac Program C: Foo.class (bytecode)
  • 6.
    The Preprocessor Extended language + loosely coupled + small, modular extension Preprocessor + no need to know compiler internals Host language – no parser – semantic analysis, errors? INTERFACE Compiler Bytecode
  • 7.
    The Front-end ExtensibleCompiler Extended + front-end reuse language – tied to the front-end Front-end Extension internals Host language INTERFACE Compiler Polyglot MetaBorg Bytecode ableJ OJ
  • 8.
    Traits: Composable Unitsof Behavior [Schärli, Ducasse, Nierstrasz, Black 2003] public trait TDrawing { void draw() { … } require Vertices getVertices(); public class Shape { } void draw() { … } Traits Vertices getVertices() { … } public class Shape with TDrawing { } … Vertices getVertices() { … } }
  • 9.
    Foo.java TBar.java Traits • Simple idea Traits • Most libraries are distributed as compiled code → Why not do this for traits? FooBar.java javac Not extensible FooBar.class
  • 10.
  • 11.
    Extended language Parsing Name Parsing Frontend The analysis Fully Type Parsing Extensible analysis Compiler? Code Parsing generation + reuse everything Backend Optimize + access to back-end –– tied to all compiler internals Parsing Emit – complexity – performance trade-offs Bytecode
  • 12.
    The White Box Model? –– tied to all compiler internals – complexity – performance trade-offs
  • 13.
    The Normalizing Compiler Extended language + simple interface Extension processor + access to back-end ++ not tied to Source+bytecode compiler internals INTERFACE Normalizing compiler Bytecode core language
  • 14.
    Foo.java TBar.class Traits FooBar.java Traits, Revisited FooBar.class
  • 15.
    Java, and Bytecode,and Backticks, Oh, My! class FooBar { Java (Foo.java) void hello1() { System.out.println(“Hello Java world”); } Bytecode (TBar.class) `hello2 (void) [ getstatic java.lang.System.out : java.io.PrintStream; ldc “Hello bytecode world”; invokevirtual java.io.PrintStream.println(java.lang.String : void); ] }
  • 16.
    Java, and Bytecode,and Backticks: Fine-grained Mixing class FooBar { void hello3() { System.out.println(`ldc "Hello mixed world"); } Typechecker `hello4 (void) [ getstatic System.out : java.io.PrintStream; P push `"Hello ” + “Java".concat("bytecode world"); SP invokevirtual java.io.PrintStream.println (java.lang.String : void); ] }
  • 17.
    Typical Applications Class-method / Statement / Expression level Traits X Partial/open classes X Iterator generators X Finite State Automata X Expression-based X extensions
  • 18.
    Example: Finite StateAutomata (DFAs) Grammar States and transitions Code
  • 19.
    Finite State Automata:Generated Java code while (true) { switch (state) { case INSIDE_QUOTE: switch (nextToken()) { case '"': consume(); state = OUTSIDE_QUOTE; // end quote found! break; … case EOF: return; default: consume(); } break; case OUTSIDE_QUOTE: switch (c) { + simple Java code … – performance trade-off } break; }
  • 20.
    Finite State Automata:Inline Bytecode InsideQuote: switch (nextToken()) { case '"': consume(); `goto OutsideQuote; case EOF: return; default: consume(); `goto InsideQuote; } } OutsideQuote: switch (c) { + performance … + low-level when necessary } + minimal changes + maintainable
  • 21.
    Method-body Extensions • Operators:x as T, x ?? y, ... • Embedded (external) DSLs: Database queries, XML, ... • Closures No r m ali ze Lower-level code
  • 22.
    Stratego Normalization Rules desugar-foreach: |[ foreach (t x in e) stm ]|  |[ Iterator x_iterator = e.iterator(); while (x_iterator.hasNext()) { t x = x_iterator.next(); stm; } ]| with x_iterator := <newname> “iterator”
  • 23.
    Normalizing ?? String property= readProperty(“x”) ?? “default”; System.out.println(property); // In basic Java: String temp = readProperty(“x”); if (temp == null) temp = “default”; String property = temp; System.out.println(property); Context-sensitive transformation
  • 24.
    Normalizing ?? “But placingstatements in front of the assimilated expressions is easy, isn’t it?” Requires syntactic knowledge: • for, while, return, throw, … • other expressions • other extensions
  • 25.
    Normalizing ?? “But placingstatements in front of the assimilated expressions is easy, isn’t it? … Right?” Also requires semantic knowledge: Iterator<String> it = ...; for (String e = e0; it.hasNext(); e = it.next() ?? “default”) { ... }
  • 26.
    Normalizing ?? “But placingstatements in front of the assimilated expressions is easy, isn’t it? … Right?” Also requires semantic knowledge: Iterator<String> it = ...; Integer temp = ...; for (String e = e0; it.hasNext(); e = temp) { ... }
  • 27.
    Normalizing '??' !! Stringproperty = readProperty(“x”) ?? “default”; System.out.println(property);
  • 28.
    Normalizing '??' !!- with Inline Bytecode String property = `[ push `readProperty(“x”); dup; store temp; ifnull else; push `e2; goto end; else: load temp; end: S ]; System.out.println(property);
  • 29.
    Normalizing '??' !!- with Inline Bytecode String property = `[ `String temp = readProperty(“x”); `if (temp == null) temp = “default”; push `temp S ]; System.out.println(property);
  • 30.
    Normalizing '??' !!- with an EBlock String property = {| String temp = readProperty(“x”); if (temp == null) temp = “default”; | temp |}; System.out.println(property);
  • 31.
    desugar-coalescing: |[ e1?? e2 ]|  |[{| String x_temp = e1; if (x_temp == null) x_temp = e2; | x_temp |} ]| with x_temp := <newname> “temp” '??' in a Normalization Rule
  • 32.
    desugar-coalescing: |[ e1?? e2 ]|  |[{| String x_temp = e1; if (x_temp == null) x_temp = e2; | x_temp |} Run-time: Exception in thread "main" ]| java.lang.NullPointerException … with x_temp := <newname> “temp” at Generated.getVertices(Generated.java:10) Position Information In Generated Code
  • 33.
    desugar-coalescing: |[ e1?? e2 ]|  |[ trace (e1 ?? e2 @ <File>:<Line>:<Column>) {{| String x_temp = e1; if (x_temp == null) x_temp = e2; | x_temp |}} ]| with x_temp := <newname> “temp” Run-time: Maintaining Position Information With Exception in thread "main" java.lang.NullPointerException … Source Tracing at Shape.getVertices(Shape.java:10)
  • 34.
    Reflection Class-method / Statement / Expression level Traits X Partial/open classes X Iterator generators X Finite State Automata X Expression-based X extensions Aspect weaving X X X
  • 35.
    Concluding Remarks • Extension effort must be proportional to the benefits • Black Boxes vs. White Boxes • Don't throw away your primitives • Normalization rules Mixing Source and Bytecode. A Case for Compilation by Normalization. Lennart C. L. Kats, Martin Bravenboer, and Eelco Visser. In OOPSLA'08. http://www.program­transformation.org/Stratego/TheDryadCompiler http://www.strategoxt.org