Generative
Programming
Prof. Dr. Ralf Lämmel
Universität Koblenz-Landau
Software Languages Team
Mainly with applications to
language processing
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Program generation
Specification Program
Program
generator
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Parser generation
Grammar
with some
code
Parser
Program
generator
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Generative programming
Generative programming is a style of computer
programming that uses automated source code
creation through generic frames, classes, prototypes,
templates, aspects, and code generators to improve
programmer productivity ... It is often related to code-
reuse topics such as component-based software
engineering and product family engineering.
Retrieved from Wikipedia
on 3 May, 2011.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Language processing patterns
manual
patterns
generative
patterns
1. The Chopper Pattern
2. The Lexer Pattern
3. The Copy/Replace Pattern
4. The Acceptor Pattern
5. The Parser Pattern
6. The Lexer Generation Pattern
7. The Acceptor Generation Pattern
8. The Parser Generation Pattern
9. The Text-to-object Pattern
10.The Object-to-text Pattern
11.The Text-to-tree Pattern
12.The Tree-walk Pattern
13.The Parser Generation2 Pattern
The Lexer Generation Pattern
See
the Lexer Pattern
for comparison
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Lexer Generation Pattern
Intent:
Analyze text at the lexical level.
Use token-level grammar as input.
Operational steps (compile time):
1. Generate code from the lexer grammar.
2. Write boilerplate code to drive the lexer.
3. Write code to process token/lexeme stream.
4. Build generated and hand-written code.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Token-level grammar for 101
COMPANY : 'company';
DEPARTMENT : 'department';
EMPLOYEE : 'employee';
MANAGER : 'manager';
ADDRESS : 'address';
SALARY : 'salary';
OPEN : '{';
CLOSE : '}';
STRING : '"' (~'"')* '"';
FLOAT : ('0'..'9')+ ('.' ('0'..'9')+)?;
WS : (' '|'r'? 'n'|'t')+;
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Lexer generation
....g ....javaANTLR
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
ANTLR
“ANTLR, ANother Tool for Language Recognition, is a language
tool that provides a framework for constructing recognizers,
interpreters, compilers, and translators from grammatical
descriptions containing actions in a variety of target languages.
ANTLR provides excellent support for tree construction, tree
walking, translation, error recovery, and error reporting.”
http://www.antlr.org/
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Boilerplate code for
lexer execution
	 	 FileInputStream stream = new FileInputStream(file);
	 	 ANTLRInputStream antlr = new ANTLRInputStream(stream);
	 	 MyLexer lexer = new MyLexer(antlr);
	 	 Token token;
	 	 while ((token = lexer.nextToken())
	 	 	 	 !=Token.EOF_TOKEN) {
...
}
Grammar-
derived class
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrLexer
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Define regular expressions for all tokens.
Generate code for lexer.
Set up input and token/lexeme stream.
Implement operations by iteration over pairs.
Build generated and hand-written code.
Summary of implementation aspects
The Acceptor Generation Pattern
See
the Acceptor Pattern
for comparison
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Acceptor Generation Pattern
Intent:
Verify syntactical correctness of input.
Use context-free grammar as input.
Operational steps (compile time):
1. Generate code from the context-free grammar.
2. Write boilerplate code to drive the acceptor (parser).
3. Build generated and hand-written code.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
company :
'company' STRING '{' department* '}' EOF;
department :
'department' STRING '{'
('manager' employee)
('employee' employee)*
department*
'}';
employee :
STRING '{'
'address' STRING
'salary' FLOAT
'}';
Wanted: a
generated
acceptor
Context-free grammar for 101
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Parser generation
.g
...Lexer.java
...Parser.java
ANTLR
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Driver for acceptor
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
SetyLexer lexer = new ...Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...Parser parser = new ...Parser(tokens);
parser....();
}
}
This is basically boilerplate
code. Copy and Paste!
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Driver for acceptor
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
SetyLexer lexer = new ...Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...Parser parser = new ...Parser(tokens);
parser....();
}
}
Prepare System.in
as ANTLR input
stream
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Driver for acceptor
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
SetyLexer lexer = new ...Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...Parser parser = new ...Parser(tokens);
parser....();
}
}
Set up lexer
for language
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Driver for acceptor
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
SetyLexer lexer = new ...Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...Parser parser = new ...Parser(tokens);
parser....();
}
}
Obtain token
stream from lexer
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Driver for acceptor
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
SetyLexer lexer = new ...Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...Parser parser = new ...Parser(tokens);
parser....();
}
} Set up parser and
feed with token
stream of lexer
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Driver for acceptor
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
SetyLexer lexer = new ...Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...Parser parser = new ...Parser(tokens);
parser....();
}
}
Invoke start
symbol of parser
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrAcceptor
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Define regular expressions for all tokens.
Define context-free productions for all nonterminals.
Generate code for acceptor (parser).
Set up input and token/lexeme stream.
Feed parser with token stream.
Acceptor (parser) execution = call start symbol.
Build generated and hand-written code.
Summary of implementation aspects
The Parser Generation Pattern
See
the Parser Pattern
for comparison
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Parser Generation Pattern
Intent:
Enable structure-aware functionality.
Use context-free grammar as the primary input.
Operational steps (compile time):
1. Include semantic actions into context-free grammar.
2. Generate code from enhanced grammar.
3. Write boilerplate code to drive the parser.
4. Build generated and hand-written code.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
A grammar rule with
a semantic action
employee :
STRING '{'
'address' STRING
'salary' s=FLOAT
{ total += Double.parseDouble($s.text); }
'}';
Semantic actions
are enclosed
between braces.
Variable local to
parser execution
Retrieve and parse
lexeme for the number
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrParser
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Parser descriptions can also use
“synthesized” attributes.
Add synthesized attributes to nonterminals.
Add semantic actions to productions.
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*;
(Context-free part in bold face.)
Compute an int
value
Perform addition
& Co.
The Text-to-object Pattern
Use parsing
for AST construction
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Text-to-object Pattern
Intent:
Enable structure-aware OO programs.
Build AST objects along parsing.
Operational steps (compile time):
1. Define object model for AST.
2. Instantiate Parser Generation pattern as follows:
Use semantic actions for AST construction.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Acceptor---for comparison
company :
	 'company' STRING '{'
	 	 department*
	 '}'
	 EOF;
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
With semantic actions for
object construction included
company returns [Company c]:
{ $c = new Company(); }
'company' STRING
{ $c.setName($STRING.text); }
'{'
( topdept= department
{ $c.getDepts().add($topdept.d); }
)*
'}'
;
Construct company object.
Set name of company.
Add all departments.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrObjects
The Object-to-text Pattern
See
the Copy/Replace Pattern
for comparison
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Text-to-object Pattern
Intent:
Map abstract syntax (in objects) to concrete syntax.
Operational steps (run/compile time):
This is ordinary OO programming.
Walk the object model for the AST.
Generate output via output stream.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Methods for pretty printing
	 public void ppCompany(Company c, String s) throws IOException {
	 	 writer = new OutputStreamWriter(new FileOutputStream(s));
	 	 write("company");
	 	 space();
	 	 write(c.getName());
	 	 space();
	 	 write("{");
	 	 right();
	 	 nl();
	 	 for (Department d : c.getDepts())
	 	 	 ppDept(d);
	 	 left();
	 	 indent();
	 	 write("}");
	 	 writer.close();
	 }
Undo indentation
Indent
Write into OutputStream
Pretty print substructures
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrObjects
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Discussion
We should assume Visitor pattern on object model.
Challenges:
How to ensure that correct syntax is generated?
How can we maintain layout of input?
Alternative technology / technique options:
Template processors
Pretty printing libraries.
The Text-to-tree Pattern
Use trees rather than POJOs
for AST construction
along parsing
Mentioned in passing
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Text-to-tree Pattern
Intent:
Enable structure-aware grammar-based programs.
Build AST trees along parsing.
Use grammar with tree construction actions as input.
Operational steps (compile time):
1. Author grammar with tree construction actions.
2. Generate parser.
Advanced stuff!
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Tree construction with ANTLR
department :
  'department' name=STRING '{'
    manager
    ('employee' employee)*
    department*
  '}'
  -> ^(DEPT $name manager employee* department*)
  ;
ANTLR’s special semantic
actions for tree construction
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrTrees
The Tree-walk Pattern
Mentioned in passing
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Tree-walk Pattern
Intent:
Operate on trees in a grammar-based manner.
Productions describe tree structure.
Semantic actions describe actions along tree walking.
Operational steps (compile time):
1. Author tree grammar (as opposed to CFG).
2. Add semantic actions.
3. Generate and build tree walker.
Advanced stuff!
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Total: Tree walking with ANTLR
company :
  ^(COMPANY STRING dept*)
  ;
  
dept :
  ^(DEPT STRING manager employee* dept*)
  ;
    
manager :
  ^(MANAGER employee)
  ;
    
employee :
  ^(EMPLOYEE STRING STRING FLOAT)
  { total += Double.parseDouble($FLOAT.text); }
  ;
Matching trees
rather than
parsing text.
Semantic actions
as before.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Cut: Tree walking with ANTLR
Match trees
Rebuild tree
topdown : employee;
        
employee :
  ^(EMPLOYEE STRING STRING s=FLOAT)
  -> ^(EMPLOYEE
STRING
STRING
FLOAT[
Double.toString(
Double.parseDouble(
$s.text) / 2.0d)])
  ;
Tree walking
strategy
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:antlrTrees
The Parser Generation2 Pattern
(That is, the Parser Generation Generation Pattern.)
See
the Parser Pattern
and
the Parser Generation Pattern
for comparison
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The Parser Generation2 Pattern
Intent:
Enable structure-aware OO programs.
Build AST objects along parsing.
Use idomatic context-free grammar as the only input.
Operational steps (compile time):
1. Author idiomatic context-free grammar.
2. Generate object model from context-free grammar.
3. Generate enhanced grammar for AST construction.
4. Generate parser and continue as before.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
More declarative
parser generation
idiomatic
grammar
Program
generator
....g
....java
object
model
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Idiomatic grammar for 101
Company : ...;
Department :
'department' dname=QqString '{'
" 'manager' manager=Employee"
" subdepartments=Department*
" " employees=NonManager*
" '}';
NonManager : ...;
Employee : ...;
Label all nonterminal
occurrences
Leverage predefined
token-level grammar
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Demo
http://101companies.org/wiki/
Contribution:yapg
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Bootstrapping
1. Develop ANTLR-based parser for idiomatic grammars.
2. Develop object model for ASTs for idiomatic grammars.
3. Develop ANTLR generator.
4. Develop object model generator.
5. Define idiomatic grammar of idiomatic grammars.
6. Generate grammar parser from said grammar.
7. Generate object model from said grammar.
8. Remove (1.) and (2.).
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
The grammar of grammars
Grammar " " : prods=Production*;
Production " : lhs=Id ':' rhs=Expression ';';
Expression ": Sequence | Choice;
Sequence "" : list=Atom*;
Choice "" " : first=Id rest=Option*;
Atom" " " " : Terminal | Nonterminal | Many;
Terminal "" : symbol=QString;
Nonterminal ": label=Id '=' symbol=Id;
Option "" " : '|' symbol=Id;
Many " " " : elem=Nonterminal '*';
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
Summary
Language processing is a programming domain.
Grammars are program specifications in that domain.
Those specifications may be enhanced semantically.
Those specifications may also be constrained idiomatically.
Manual implementation of such specifications is not productive.
Parser generators automatically implement such specifications.
Parser generation is an instance of generative programming.
(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)
A development cycle
for language-based tools
1. Author language samples.
2. Complete samples into test cases.
3. Author the EBNF for the language.
4. Refine EBNF into acceptor (using generation or not).
5. Build and test the acceptor.
6. Extend the acceptor with semantic actions.
7. Build and test the language processor.
Some bits of
software engineering

Generative programming (mostly parser generation)

  • 1.
    Generative Programming Prof. Dr. RalfLämmel Universität Koblenz-Landau Software Languages Team Mainly with applications to language processing
  • 2.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Program generation Specification Program Program generator
  • 3.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Parser generation Grammar with some code Parser Program generator
  • 4.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Generative programming Generative programming is a style of computer programming that uses automated source code creation through generic frames, classes, prototypes, templates, aspects, and code generators to improve programmer productivity ... It is often related to code- reuse topics such as component-based software engineering and product family engineering. Retrieved from Wikipedia on 3 May, 2011.
  • 5.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Language processing patterns manual patterns generative patterns 1. The Chopper Pattern 2. The Lexer Pattern 3. The Copy/Replace Pattern 4. The Acceptor Pattern 5. The Parser Pattern 6. The Lexer Generation Pattern 7. The Acceptor Generation Pattern 8. The Parser Generation Pattern 9. The Text-to-object Pattern 10.The Object-to-text Pattern 11.The Text-to-tree Pattern 12.The Tree-walk Pattern 13.The Parser Generation2 Pattern
  • 6.
    The Lexer GenerationPattern See the Lexer Pattern for comparison
  • 7.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Lexer Generation Pattern Intent: Analyze text at the lexical level. Use token-level grammar as input. Operational steps (compile time): 1. Generate code from the lexer grammar. 2. Write boilerplate code to drive the lexer. 3. Write code to process token/lexeme stream. 4. Build generated and hand-written code.
  • 8.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Token-level grammar for 101 COMPANY : 'company'; DEPARTMENT : 'department'; EMPLOYEE : 'employee'; MANAGER : 'manager'; ADDRESS : 'address'; SALARY : 'salary'; OPEN : '{'; CLOSE : '}'; STRING : '"' (~'"')* '"'; FLOAT : ('0'..'9')+ ('.' ('0'..'9')+)?; WS : (' '|'r'? 'n'|'t')+;
  • 9.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Lexer generation ....g ....javaANTLR
  • 10.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) ANTLR “ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting.” http://www.antlr.org/
  • 11.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Boilerplate code for lexer execution FileInputStream stream = new FileInputStream(file); ANTLRInputStream antlr = new ANTLRInputStream(stream); MyLexer lexer = new MyLexer(antlr); Token token; while ((token = lexer.nextToken()) !=Token.EOF_TOKEN) { ... } Grammar- derived class
  • 12.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrLexer
  • 13.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Define regular expressions for all tokens. Generate code for lexer. Set up input and token/lexeme stream. Implement operations by iteration over pairs. Build generated and hand-written code. Summary of implementation aspects
  • 14.
    The Acceptor GenerationPattern See the Acceptor Pattern for comparison
  • 15.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Acceptor Generation Pattern Intent: Verify syntactical correctness of input. Use context-free grammar as input. Operational steps (compile time): 1. Generate code from the context-free grammar. 2. Write boilerplate code to drive the acceptor (parser). 3. Build generated and hand-written code.
  • 16.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) company : 'company' STRING '{' department* '}' EOF; department : 'department' STRING '{' ('manager' employee) ('employee' employee)* department* '}'; employee : STRING '{' 'address' STRING 'salary' FLOAT '}'; Wanted: a generated acceptor Context-free grammar for 101
  • 17.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Parser generation .g ...Lexer.java ...Parser.java ANTLR
  • 18.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Driver for acceptor import org.antlr.runtime.*; public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); } } This is basically boilerplate code. Copy and Paste!
  • 19.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Driver for acceptor import org.antlr.runtime.*; public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); } } Prepare System.in as ANTLR input stream
  • 20.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Driver for acceptor import org.antlr.runtime.*; public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); } } Set up lexer for language
  • 21.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Driver for acceptor import org.antlr.runtime.*; public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); } } Obtain token stream from lexer
  • 22.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Driver for acceptor import org.antlr.runtime.*; public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); } } Set up parser and feed with token stream of lexer
  • 23.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Driver for acceptor import org.antlr.runtime.*; public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); } } Invoke start symbol of parser
  • 24.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrAcceptor
  • 25.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Define regular expressions for all tokens. Define context-free productions for all nonterminals. Generate code for acceptor (parser). Set up input and token/lexeme stream. Feed parser with token stream. Acceptor (parser) execution = call start symbol. Build generated and hand-written code. Summary of implementation aspects
  • 26.
    The Parser GenerationPattern See the Parser Pattern for comparison
  • 27.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Parser Generation Pattern Intent: Enable structure-aware functionality. Use context-free grammar as the primary input. Operational steps (compile time): 1. Include semantic actions into context-free grammar. 2. Generate code from enhanced grammar. 3. Write boilerplate code to drive the parser. 4. Build generated and hand-written code.
  • 28.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) A grammar rule with a semantic action employee : STRING '{' 'address' STRING 'salary' s=FLOAT { total += Double.parseDouble($s.text); } '}'; Semantic actions are enclosed between braces. Variable local to parser execution Retrieve and parse lexeme for the number
  • 29.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrParser
  • 30.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Parser descriptions can also use “synthesized” attributes. Add synthesized attributes to nonterminals. Add semantic actions to productions. expr returns [int value] : e=multExpr {$value = $e.value;} ( '+' e=multExpr {$value += $e.value;} | '-' e=multExpr {$value -= $e.value;} )*; (Context-free part in bold face.) Compute an int value Perform addition & Co.
  • 31.
    The Text-to-object Pattern Useparsing for AST construction
  • 32.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Text-to-object Pattern Intent: Enable structure-aware OO programs. Build AST objects along parsing. Operational steps (compile time): 1. Define object model for AST. 2. Instantiate Parser Generation pattern as follows: Use semantic actions for AST construction.
  • 33.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Acceptor---for comparison company : 'company' STRING '{' department* '}' EOF;
  • 34.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) With semantic actions for object construction included company returns [Company c]: { $c = new Company(); } 'company' STRING { $c.setName($STRING.text); } '{' ( topdept= department { $c.getDepts().add($topdept.d); } )* '}' ; Construct company object. Set name of company. Add all departments.
  • 35.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrObjects
  • 36.
    The Object-to-text Pattern See theCopy/Replace Pattern for comparison
  • 37.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Text-to-object Pattern Intent: Map abstract syntax (in objects) to concrete syntax. Operational steps (run/compile time): This is ordinary OO programming. Walk the object model for the AST. Generate output via output stream.
  • 38.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Methods for pretty printing public void ppCompany(Company c, String s) throws IOException { writer = new OutputStreamWriter(new FileOutputStream(s)); write("company"); space(); write(c.getName()); space(); write("{"); right(); nl(); for (Department d : c.getDepts()) ppDept(d); left(); indent(); write("}"); writer.close(); } Undo indentation Indent Write into OutputStream Pretty print substructures
  • 39.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrObjects
  • 40.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Discussion We should assume Visitor pattern on object model. Challenges: How to ensure that correct syntax is generated? How can we maintain layout of input? Alternative technology / technique options: Template processors Pretty printing libraries.
  • 41.
    The Text-to-tree Pattern Usetrees rather than POJOs for AST construction along parsing Mentioned in passing
  • 42.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Text-to-tree Pattern Intent: Enable structure-aware grammar-based programs. Build AST trees along parsing. Use grammar with tree construction actions as input. Operational steps (compile time): 1. Author grammar with tree construction actions. 2. Generate parser. Advanced stuff!
  • 43.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Tree construction with ANTLR department :   'department' name=STRING '{'     manager     ('employee' employee)*     department*   '}'   -> ^(DEPT $name manager employee* department*)   ; ANTLR’s special semantic actions for tree construction
  • 44.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrTrees
  • 45.
  • 46.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Tree-walk Pattern Intent: Operate on trees in a grammar-based manner. Productions describe tree structure. Semantic actions describe actions along tree walking. Operational steps (compile time): 1. Author tree grammar (as opposed to CFG). 2. Add semantic actions. 3. Generate and build tree walker. Advanced stuff!
  • 47.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Total: Tree walking with ANTLR company :   ^(COMPANY STRING dept*)   ;    dept :   ^(DEPT STRING manager employee* dept*)   ;      manager :   ^(MANAGER employee)   ;      employee :   ^(EMPLOYEE STRING STRING FLOAT)   { total += Double.parseDouble($FLOAT.text); }   ; Matching trees rather than parsing text. Semantic actions as before.
  • 48.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Cut: Tree walking with ANTLR Match trees Rebuild tree topdown : employee;          employee :   ^(EMPLOYEE STRING STRING s=FLOAT)   -> ^(EMPLOYEE STRING STRING FLOAT[ Double.toString( Double.parseDouble( $s.text) / 2.0d)])   ; Tree walking strategy
  • 49.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:antlrTrees
  • 50.
    The Parser Generation2Pattern (That is, the Parser Generation Generation Pattern.) See the Parser Pattern and the Parser Generation Pattern for comparison
  • 51.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The Parser Generation2 Pattern Intent: Enable structure-aware OO programs. Build AST objects along parsing. Use idomatic context-free grammar as the only input. Operational steps (compile time): 1. Author idiomatic context-free grammar. 2. Generate object model from context-free grammar. 3. Generate enhanced grammar for AST construction. 4. Generate parser and continue as before.
  • 52.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) More declarative parser generation idiomatic grammar Program generator ....g ....java object model
  • 53.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Idiomatic grammar for 101 Company : ...; Department : 'department' dname=QqString '{' " 'manager' manager=Employee" " subdepartments=Department* " " employees=NonManager* " '}'; NonManager : ...; Employee : ...; Label all nonterminal occurrences Leverage predefined token-level grammar
  • 54.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Demo http://101companies.org/wiki/ Contribution:yapg
  • 55.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Bootstrapping 1. Develop ANTLR-based parser for idiomatic grammars. 2. Develop object model for ASTs for idiomatic grammars. 3. Develop ANTLR generator. 4. Develop object model generator. 5. Define idiomatic grammar of idiomatic grammars. 6. Generate grammar parser from said grammar. 7. Generate object model from said grammar. 8. Remove (1.) and (2.).
  • 56.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) The grammar of grammars Grammar " " : prods=Production*; Production " : lhs=Id ':' rhs=Expression ';'; Expression ": Sequence | Choice; Sequence "" : list=Atom*; Choice "" " : first=Id rest=Option*; Atom" " " " : Terminal | Nonterminal | Many; Terminal "" : symbol=QString; Nonterminal ": label=Id '=' symbol=Id; Option "" " : '|' symbol=Id; Many " " " : elem=Nonterminal '*';
  • 57.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) Summary Language processing is a programming domain. Grammars are program specifications in that domain. Those specifications may be enhanced semantically. Those specifications may also be constrained idiomatically. Manual implementation of such specifications is not productive. Parser generators automatically implement such specifications. Parser generation is an instance of generative programming.
  • 58.
    (C) 2010-2013 Prof.Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable) A development cycle for language-based tools 1. Author language samples. 2. Complete samples into test cases. 3. Author the EBNF for the language. 4. Refine EBNF into acceptor (using generation or not). 5. Build and test the acceptor. 6. Extend the acceptor with semantic actions. 7. Build and test the language processor. Some bits of software engineering