1. AustenX
AustenX: a parser generator
with some novel features
Presented by Matthew Goode
scratchy.org.nz
2. Overview
● AustenX is a parser generator built using Java
● But target languages will extend beyond Java
● It is based on Parsing Expression Grammars (PEGs),
and uses Packrat memorisation
● Provides extensions to PEGs, and also handles
left-recursion well (not so easy with PEG
parsers) – with an interesting solution to a an
interesting theorectical problem that has
practical uses.
● AustenX is built using a code-generator code
generator tool, called SkeletonX, which is
interesting in its own right.
3. Parser Generators
● Given a grammar, generate code to facilitate
reading text files
● Parsing Expression Grammar a form of
grammar, that unlike context-free, is
unambigiuous (not based on generation).
● Eg:
● 'Hello' 'A'+
● 'A'* 'B'? 'A' / 'B'
● E = { E '+' E } / NUMBER
5. Recursive Descent
● PEGs are easy to translate to code
● Eg FunctionCall = ID '(' Arguments ')' 'A'*
function readFunctionCall() {
if(!readID()) reset() return fail
if(!consume('(') reset() return fail
if(!readArguments() ) reset() return fail
if(!consume(')') reset() return fail
while(consume('A')) {}
}
6. Redundant Calls
● Like doing a Fibonacci calculation
● Dynamic programming solution
● Create a table of Rules x Positions
● Starting at end, calculate Rule at each position
● Previous rules only require later rules
● Packrat parsing
● Start at beginning, only do resolution when
requested, store result
7. Example
4 + 3 * 2 A A A
Add
Mult
As
Exp
Hello
Stuff
8. Left Recursion
● Recursive Descent and Packrat parsing has
problems with left recursion
● EG
● E = {E '+' E } / Number
● Creates infinite loops
function readE() {
readE()
...
}
9. Solution
● 'Bubble up' resolutions
● Eg 1 + 2
● First pass, no resolution, so E '+' E fails, but
Num consumes the 1
● Current best => E = Num ( '1' )
● Retry until no more gains
● E = ('1') + ('2')
10. Problem
● Always right associative
● Eg E = { E '+' E } / { E '*' E } / Num
● Always resolves 1 * 2 + 3 to ( (1) * ( (2) + (3) )
● Eg E = { E '*' E } / { E '+' E } / Num
● Also resolves 1 * 2 + 3 to ( (1) * ( (2) + (3) )!
● Problem only occurs when there is a right
recursion of rule with left recursion
11. Solution
● A process of reinterpreting/rewriting
● Recall 1 * 2 + 3, with E = { E '+' E } / { E '*' E } /
Num
● Resolution at '2' is ((2) + (3))
● If resolved '(1) ', note that '+' is higher priority
than '*', so search recursion to find lower –
eg (2)
● Now resolved to ((1) * (2))
● Bubble up to get (((1) * (2)) + 3)
12. Back to AustenX
● Currently uses seperate tokenisation
● A DFA (Lex-like) tokeniser included
● Allows the left-recursion discussed, with
indirect and direct recursion
● Allows selective memorisation
● Provides statistics on use
● Turns out memorisation (at least with
tokenisation) is mostly not needed
● Has extensions to PEG
13. Example grammar
pattern Example2 {
Add ( Example2:left PLUS Example2:right )
Number ( NUMBER:value )
ID ( ID:value )
}
pattern Example1(
ID:name STRING Example2:first
ID Example2
)
16. Future directions
● Better error handling
● Improved tokenisation
● With scanner-free, and binary modes
● New language targets
● Support for indentation-sensitive languages
● Formal (or at least informal) write-up of
ordered left recursion
17. SkeletonX
● AustenX is a code generator
● SkeletonX is a tool for making code
generators. It is a code generator code
generator.
● Not just a template engine!
● It understands (a subset of) Java
● Currently going through many iterations to get
a version that uses itself (not quite there yet)
● Many headaches over how scope should work
18. SkeletonX example 1
● Define a design (an heirarchical data structure)
● define Example {
A (String name) {
B (int value, C cRef)
}
C (String cName)
}
19. SkeletonX example 2
public class Main {
@link A {
public AClass doStuff[doStuff,$name]() {
return new Aclass(@link B $value);
}
}
}
@link A {
public class AClass[$name, Class] {
@constructor ( @link B int value) { }
}
}
20. SkeletonX example 3
● In user code:
DesignRoot r;
CBlock c = r.addCBlock(''Simple'');
r.addABlock(''First'').addBBlock(42, c);
ABlock a2 = r.addABlock(''Second'');
a2.addBBlock(64, c);
a2.addBBlock(35, c);
21. SkeletonX example 4
public class Main {
public FirstClass doStuffFirst() {
return new FirstClass(42);
}
public SecondClass doStuffSecond() {
return new SecondClass(64,35);
}
}
public class FirstClass {
public FirstClass(int value) { }
}
public class SecondClass {
Public SecondClass(int value, int value2) { }
}
22. Other projects
● Munky
● A unified language for making web apps that
compiles to HTML/Javascript/PHP/SQL
● Also, a related java based framework
● Very cool.
● At some point, a game-making tool designed
for young children in a class room setting
(eg, centralised storage/sharing)
23. HELP! (Conclusion)
● Lots of things to do
● Lots of projects
● Write paper on AustenX
● Not much money
● Looking for part time work, or funding
● Also, love to have access to journals again...
scratchy.org.nz