Extensible domain-specific programming
for the sciences
Eric Van Wyk
University of Minnesota

VBI, December 5, 2013

slides available at http:www.cs.umn.edu/~evw

1 / 45
Current trends / topics in PL
Formal verification
CompCert - http://compcert.inria.fr/
Astr´e - http://www.astree.ens.fr/
e
Hoare logic (1960’s)
{P} code {Q}
Proof assistants: Coq, Abella, Isabelle, ...
use required in some PL publishing venues

2 / 45
3 / 45
4 / 45
Current trends / topics in PL
Parallel programming - multiple cores, everywhere.

“no more free lunch”
need
new abstractions: e.g. Cilk, MapReduce, FP
new semantics: e.g. deterministic parallel Java
5 / 45
Current trends / topics in PL
Expressive and safe static typing
extending richer static types, e.g.
append ::

( [a], [a] ) -> [a]

to dependent types
append ::

( [a|n], [a|m] ) -> [a|n+m]

turns array out-of-bounds and null-pointer bugs into
static type errors

6 / 45
Extensible languages
Allow programmers select the features to be used in their
programming languages.
new syntax / notations
new semantic analyses / error-checking
Why would anyone want to do that?

7 / 45
Programming language features
General purpose features
assignment statements, loops, if-then-else statements
functions (perhaps higher-order) and procedures
I/O facilities
modules
data: integer, strings, arrays, records
Domain-specific features
matrix operations (MATLAB)
regular expression matching (Perl, Python)
statistics functions (R)
computational geometry operations (LN)
parallel computing (SISAL, X10, NESL, etc.)
Many similarities, needless differences.
Working with multiple (domain-specific) languages is a
headache.
8 / 45
Extensible languages
Allow programmers select the features to be used in their
programming languages.
new syntax / notations
new semantic analyses / error-checking
Pick a general purpose host language (e.g. ANSI C),
extend with domain-specific features.
myProgram.xc =⇒ myProgram.c

9 / 45
Regular expressions
# include " stdio . h "
# include " regex . h "
int main ( int argc , char * argv []) {
char * text = readFileContents ( " X . data " ) ;
// eukaryotic messenger RNA sequences
regex foo = /^ ATG [ ATGC ]{3 ,10} A {5 ,10} $ / ;
if ( text =~ foo )
printf ( " Matches ... n " ) ;
else
printf ( " Doesn ’t match ... n " ) ;
}
10 / 45
Mining Climate Data - Ocean Eddies

Spinning pools of water
Transport heat, salt, and
nutrients
Learning about their
behavior is difficult

11 / 45
A time slice for a point in the ocean

12 / 45
main ( int argc , char ** argv ) {
Matrix float <3 > data
= readMatrix ( " ssh . data " ) ;
Matrix float <3 > scores
= matrixMap ( scoreTS , data , [2]) ;
writeMatrix ( " temporalScores . data " ,
scores ) ;
}

13 / 45
Matrix float <1 > scoreTS ( Matrix float <1 > ts )
{
int i = 0 , beginning , n = dimSize ( ts , 0) ;
Matrix float <1 > scores
= init ( Matrix float <1 > , dimSize ( ts , 0) ) ;
while ( ts [ i ] < ts [ i +1]) { i = i +1 ; }
Matrix float [0] trough ;
while ( i < n -1) {
( trough , beginning , i )
= getTrough ( ts , i ) ;
scores [ beginning :: i ]
= computeArea ( trough ) ;
}
return scores ;
}

14 / 45
Matrix float <1 > computeArea
( Matrix float <1 > areaOfInterest )
{
float y1 = areaOfInterest [0];
float y2 = areaOfInterest [ end ];
int x1 = 0;
int x2 = dimSize ( areaOfInterest ,0) -1;
float m = ( y1 - y2 ) / (( float ) ( x1 - x2 ) ) ;
float b = y1 - m * x1 ;
Matrix float <1 > Line = ( x1 :: x2 ) * m + b ;
float area
= with ( x1 <= i < x2 )
fold (+ , 0.0 , line - areaOfInterest ) ;
return
with ( 0 <= i < dimSize ( Line ,0) )
genarray ([ dimSize ( Line , 0) ] , area ) ;
}
15 / 45
( Matrix float <1 > , int , int ) getTrough
( Matrix float <1 > ts , int i )
{
int beginning = i ;
int n = dimSize ( ts , 0) ;
while ( i +1 < n && ts [ i ] >= ts [ i +1])
i = i +1;
while ( i +1 < n && ts [ i ] < ts [ i +1])
i = i +1;
return ( ts [ beginning :: i ] , beginning , i ) ;
}

16 / 45
Matrix extensions
several features from MATLAB
with, fold, and genarray from Single Assignment C
all translated down to expected C code
straightforward parallel implementations of matrixMap,
with, fold, and genarray.

17 / 45
Dimension analysis

pound-seconds = newton-seconds
18 / 45
# include " stdio . h "
int main ( int
int meter x
int meter y
int meter ^2

argc , char * argv []) {
= 3.4 ;
= 5.6 ;
area = x * y ;

printf ( " % d  n " , x + y ) ;
printf ( " % d  n " , x + z ) ;

// OK
// Error

}

19 / 45
# include " stdio . h "
int main ( int
int meter x
int meter y
int meter ^2

argc , char * argv []) {
= 3.4 ;
= 5.6 ;
area = x * y ;

printf ( " % d  n " , x + y ) ; // OK
// printf ("% d  n " , x + z ) ; // Error
}

20 / 45
# include " stdio . h "
int main ( int
int
x
int
y
int

argc , char * argv []) {
= 3.4 ;
= 5.6 ;
area = x * y ;

printf ( " % d  n " , x + y ) ;

// OK

}

Extensions of this form find errors, but otherwise are “erased”
during translation.

21 / 45
Extension composition
Programmers can select the extensions that they want.
May want to use multiple extensions in the same program.
Distinguish between
1. extension user
has no knowledge of language design or implementations

2. extension developer
must know about language design and implementation

Tools build a custom .xc =⇒ .c translator for them
How can that be done?

22 / 45
Building translators from composable extensible
languages
Two primary challenges:
1. composable syntax — enables building a scanner, parser
context-aware scanning [GPCE’07]
modular determinism analysis [PLDI’09]
Copper

2. composable semantics — analysis and translations
attribute grammars with forwarding, collections and
higher-order attributes
set union of specification components
sets of productions, non-terminals, attributes
sets of attribute defining equations, on a production
sets of equations contributing values to a single attribute

modular well-definedness analysis [SLE’12a]
modular termination analysis [SLE’12b, Krishnan-PhD]
Silver
23 / 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals: Stmt, Expr
terminals: Id
/[a-zA-Z][a-zA-Z0-9]*/
Num /[0-9]+/
Eq
’=’
Semi ’;’
Plus ’+’
Mult ’*’
Stmt ::= Stmt Semi Stmt
Stmt ::= Id Eq Expr
Expr ::= Expr Plus Expr
Expr ::= Expr Mult Expr
Expr ::= Id
24 / 45
Stmt

Stmt
Id(x)

Eq

Semi

Stmt
Id(a)

Expr

Eq

Expr
Id(b)

Expr

Plus

Expr

Id(y)
Expr

Mult

Num(3)

Expr
Id(z)

Id(x), Eq, Id(y), Plus, Num(3), Mult, Id(z), Semi, Id(a), Eq, Id(b)
“x

=

y

+

3

*

z

;

a

=

b”
25 / 45
Attribute Grammars
add semantics — meaning — to context free grammars
nodes (non-terminals) have attributes
that is, semantic values

Expr may be attributed with
type - the type of the expression
errors - list of error messages
env - mapping variable names to their types

Stmt may be attributed with errors and env

26 / 45
...

errors=[ERROR];

Stmt env = [x→int, y→int, z→string]

Stmt errors = [ ]

Semi

env = [x→int, y→int, z→string]

Id(x)

Eq

Expr type = int; errors = [ ] Id(x)

Stmt errors=[ERRO

env = [x→in

Eq

Expr t=string

env = [x→int, y→int, z→string]

env = [
Id(z)

Expr

type = int; errors = [ ]

Plus

Expr env = [x→int, y→int, z→string]

Id(y)
Expr
Num(3)

Mult

Expr type = int; errors = [ ]

env = [x→int, y→int, z→st

Id(y)
27 / 45
Attribute grammar specifications
Equations associated with productions define attribute values.
abstract production addition
e : : Expr : : = l : : Expr ’+ ’ r : : Expr
{
e . e r r o r s := l . e r r o r ++ r . e r r o r s ++
. . . c h e c k t h a t l and r a r e i n t e g e r s

...

e . type = i n t ;
l . env = e . env ;
r . env = e . env ;
}

28 / 45
Modern attribute grammars
higher-order attributes
reference attributes
collection attributes
forwarding
module systems
separate compilation
etc.

29 / 45
for-loop as an extension
abstract production for
s : : Stmt : : = i : : Name l o w e r : : Expr u p p e r : : Expr
body : : Stmt
{
s . e r r o r s := l o w e r . e r r o r ++ u p p e r . e r r o r s ++
body . e r r o r s ++
. . . c h e c k t h a t i i s an i n t e g e r . . .
forwards to
// i=l o w e r ; w h i l e ( i <= u p p e r ) { body ; i=i +1;}
seq ( assignment ( varRef ( i ) , lower ) ,
while (
l t e ( varRef ( i ) , upper ) ,
b l o c k ( s e q ( body ,
a s s i g n m e n t ( v a r R e f ( i ) , add ( v a r R e f ( i ) ,
i n t L i t ( ”1” ) ) ) ) ) ) ) ;
}
30 / 45
Building an attribute grammar evaluator from composed
specifications.

... AG H ∪∗ {AG E1 , ..., AG En }
∀i ∈ [1, n].modComplete(AG H , AG Ei )
E
E
⇒ ⇒ complete(AG H ∪ {AG1 , ..., AGn })
Monolithic analysis - not too hard, but not too useful.
Modular analysis - harder, but required [SLE’12a].

31 / 45
Challenges in scanning

Keywords in embedded languages may be identifiers in host
language:
int SELECT ;
...
rs = using c query { SELECT last name
FROM person WHERE ...

32 / 45
Challenges in scanning

Different extensions use same keyword
connection c "jdbc:derby:./derby/db/testdb"
with table person [ person id INTEGER,
first name VARCHAR ];
...
b = table ( c1 : T F ,
c2 : F * ) ;

33 / 45
Challenges in scanning

Operators with different precedence specifications:
x = 3 + y * z ;
...
str = /[a-z][a-z0-9]*.java/

34 / 45
Challenges in scanning

Terminals that are prefixes of others
List<List<Integer>> dlist ;
...
x = y >> 4 ;

35 / 45
Need for context

Traditionally, parser and scanner are disjoint.
Scanner → Parser → Semantic Analysis
In context aware scanning, they communicate
Scanner

Parser → Semantic Analysis

36 / 45
Context aware scanning
Scanner recognizes only tokens valid for current “context”
keeps embedded sub-languages, in a sense, separate
Consider:
chan in, out;
for i in a { a[i] = i*i ; }

Two terminal symbols that match “in”.
terminal IN ’in’ ;
terminal ID /[a-zA-Z ][a-zA-Z 0-9]*/
submits to {keyword };
terminal FOR ’for’ lexer class {keyword };

example is part of AbleP [SPIN’11]

37 / 45
Parsing C as an extension to Promela
c_decl {
typedef struct Coord {
int x, y; } Coord;
c_state "Coord pt" "Global"
int z = 3;

}
/* goes in state vector */
/* standard global decl */

active proctype example()
{ c_code { now.pt.x = now.pt.y = 0; };
do :: c_expr { now.pt.x == now.pt.y }
-> c_code { now.pt.y++; }
:: else -> break
od;
c_code { printf("values %d: %d, %d,%dn",
Pexample->_pid, now.z, now.pt.x, now.pt.y);
38 / 45
Context aware scanning
This scanning algorithm subordinates the
disambiguation principle of maximal munch
to the principle of
disambiguation by context.
It will return a shorter valid match before a longer invalid
match.
In List<List<Integer>> before “>”,
“>” in valid lookahead but “>>” is not.
A context aware scanner is essentially an implicitly-moded
scanner.
There is no explicit specification of valid look ahead.
It is generated from standard grammars and terminal
regexs.
39 / 45
With a smarter scanner, LALR(1) is not so brittle.
We can build syntactically composable language
extensions.
Context aware scanning makes composable syntax “more
likely”
But it does not give a guarantee of composability.

40 / 45
Building a parser from composed specifications.

... CFG H ∪∗ {CFG E1 , ..., CFG En }
∀i ∈ [1, n].isComposable(CFG H , CFG Ei )∧
conflictFree(CFG H ∪ CFG Ei )
⇒ ⇒ conflictFree(CFG H ∪ {CFG E1 , ..., CFG En })
Monolithic analysis - not too hard, but not too useful.
Modular analysis - harder, but required [PLDI’09].
Non-commutative composition of restricted LALR(1)
grammars.
41 / 45
42 / 45
Expressiveness versus safe composition

Compare to
other parser generators
libraries
The modular compositionality analysis does not require
context aware scanning.
But, context aware scanning makes it practical.

43 / 45
Future Work
ableC - extensible C11 specification
builds on lessons learned from extensible specifications of
Java [ECOOP’07], Lustre [FASE’07], Modelica,
Promela [SPIN’11].
incorporate existing language extensions

composition of language extensions are compile-time
language specific analysis
new applications of AGs

44 / 45
Thanks for your attention.

Questions?
http://melt.cs.umn.edu
evw@cs.umn.edu

45 / 45
Eric Van Wyk and August Schwerdfeger.
Context-aware scanning for parsing extensible languages.
In Intl. Conf. on Generative Programming and Component
Engineering, (GPCE), pages 63–72. ACM, 2007.
Eric Van Wyk, Derek Bodin, Jimin Gao, and Lijesh
Krishnan.
Silver: an extensible attribute grammar system.
Science of Computer Programming, 75(1–2):39–54,
January 2010.
August Schwerdfeger and Eric Van Wyk.
Verifiable composition of deterministic grammars.
In Proc. of Conf. on Programming Language Design and
Implementation (PLDI), pages 199–210. ACM, June 2009.

45 / 45
Ted Kaminski and Eric Van Wyk.
Modular well-definedness analysis for attribute grammars.
In Proc. of Intl. Conf. on Software Language Engineering
(SLE), volume 7745 of LNCS, pages 352–371.
Springer-Verlag, September 2012.
Lijesh Krishnan and Eric Van Wyk.
Termination analysis for higher-order attribute grammars.
In Proceedings of the 5th International Conference on
Software Language Engineering (SLE 2012), volume 7745
of LNCS, pages 44–63. Springer-Verlag, September 2012.
Lijesh Krishnan.
Composable Semantics Using Higher-Order Attribute
Grammars.
PhD thesis, University of Minnesota, Department of
Computer Science and Engineering, 2012.
http://purl.umn.edu/144010
45 / 45
Yogesh Mali and Eric Van Wyk.
Building extensible specifications and implementations of
Promela with AbleP.
In Proc. of Intl. SPIN Workshop on Model Checking of
Software, volume 6823 of LNCS, pages 108–125.
Springer-Verlag, July 2011.
Eric Van Wyk, Lijesh Krishnan, August Schwerdfeger, and
Derek Bodin.
Attribute grammar-based language extensions for Java.
In Proc. of European Conf. on Object Oriented Prog.
(ECOOP), volume 4609 of LNCS, pages 575–599.
Springer-Verlag, 2007.

45 / 45
Jimin Gao, Mats Heimdahl, and Eric Van Wyk.
Flexible and extensible notations for modeling languages.
In Fundamental Approaches to Software Engineering,
FASE 2007, volume 4422 of LNCS, pages 102–116.
Springer-Verlag, March 2007.

45 / 45

talk at Virginia Bioinformatics Institute, December 5, 2013

  • 1.
    Extensible domain-specific programming forthe sciences Eric Van Wyk University of Minnesota VBI, December 5, 2013 slides available at http:www.cs.umn.edu/~evw 1 / 45
  • 2.
    Current trends /topics in PL Formal verification CompCert - http://compcert.inria.fr/ Astr´e - http://www.astree.ens.fr/ e Hoare logic (1960’s) {P} code {Q} Proof assistants: Coq, Abella, Isabelle, ... use required in some PL publishing venues 2 / 45
  • 3.
  • 4.
  • 5.
    Current trends /topics in PL Parallel programming - multiple cores, everywhere. “no more free lunch” need new abstractions: e.g. Cilk, MapReduce, FP new semantics: e.g. deterministic parallel Java 5 / 45
  • 6.
    Current trends /topics in PL Expressive and safe static typing extending richer static types, e.g. append :: ( [a], [a] ) -> [a] to dependent types append :: ( [a|n], [a|m] ) -> [a|n+m] turns array out-of-bounds and null-pointer bugs into static type errors 6 / 45
  • 7.
    Extensible languages Allow programmersselect the features to be used in their programming languages. new syntax / notations new semantic analyses / error-checking Why would anyone want to do that? 7 / 45
  • 8.
    Programming language features Generalpurpose features assignment statements, loops, if-then-else statements functions (perhaps higher-order) and procedures I/O facilities modules data: integer, strings, arrays, records Domain-specific features matrix operations (MATLAB) regular expression matching (Perl, Python) statistics functions (R) computational geometry operations (LN) parallel computing (SISAL, X10, NESL, etc.) Many similarities, needless differences. Working with multiple (domain-specific) languages is a headache. 8 / 45
  • 9.
    Extensible languages Allow programmersselect the features to be used in their programming languages. new syntax / notations new semantic analyses / error-checking Pick a general purpose host language (e.g. ANSI C), extend with domain-specific features. myProgram.xc =⇒ myProgram.c 9 / 45
  • 10.
    Regular expressions # include" stdio . h " # include " regex . h " int main ( int argc , char * argv []) { char * text = readFileContents ( " X . data " ) ; // eukaryotic messenger RNA sequences regex foo = /^ ATG [ ATGC ]{3 ,10} A {5 ,10} $ / ; if ( text =~ foo ) printf ( " Matches ... n " ) ; else printf ( " Doesn ’t match ... n " ) ; } 10 / 45
  • 11.
    Mining Climate Data- Ocean Eddies Spinning pools of water Transport heat, salt, and nutrients Learning about their behavior is difficult 11 / 45
  • 12.
    A time slicefor a point in the ocean 12 / 45
  • 13.
    main ( intargc , char ** argv ) { Matrix float <3 > data = readMatrix ( " ssh . data " ) ; Matrix float <3 > scores = matrixMap ( scoreTS , data , [2]) ; writeMatrix ( " temporalScores . data " , scores ) ; } 13 / 45
  • 14.
    Matrix float <1> scoreTS ( Matrix float <1 > ts ) { int i = 0 , beginning , n = dimSize ( ts , 0) ; Matrix float <1 > scores = init ( Matrix float <1 > , dimSize ( ts , 0) ) ; while ( ts [ i ] < ts [ i +1]) { i = i +1 ; } Matrix float [0] trough ; while ( i < n -1) { ( trough , beginning , i ) = getTrough ( ts , i ) ; scores [ beginning :: i ] = computeArea ( trough ) ; } return scores ; } 14 / 45
  • 15.
    Matrix float <1> computeArea ( Matrix float <1 > areaOfInterest ) { float y1 = areaOfInterest [0]; float y2 = areaOfInterest [ end ]; int x1 = 0; int x2 = dimSize ( areaOfInterest ,0) -1; float m = ( y1 - y2 ) / (( float ) ( x1 - x2 ) ) ; float b = y1 - m * x1 ; Matrix float <1 > Line = ( x1 :: x2 ) * m + b ; float area = with ( x1 <= i < x2 ) fold (+ , 0.0 , line - areaOfInterest ) ; return with ( 0 <= i < dimSize ( Line ,0) ) genarray ([ dimSize ( Line , 0) ] , area ) ; } 15 / 45
  • 16.
    ( Matrix float<1 > , int , int ) getTrough ( Matrix float <1 > ts , int i ) { int beginning = i ; int n = dimSize ( ts , 0) ; while ( i +1 < n && ts [ i ] >= ts [ i +1]) i = i +1; while ( i +1 < n && ts [ i ] < ts [ i +1]) i = i +1; return ( ts [ beginning :: i ] , beginning , i ) ; } 16 / 45
  • 17.
    Matrix extensions several featuresfrom MATLAB with, fold, and genarray from Single Assignment C all translated down to expected C code straightforward parallel implementations of matrixMap, with, fold, and genarray. 17 / 45
  • 18.
    Dimension analysis pound-seconds =newton-seconds 18 / 45
  • 19.
    # include "stdio . h " int main ( int int meter x int meter y int meter ^2 argc , char * argv []) { = 3.4 ; = 5.6 ; area = x * y ; printf ( " % d n " , x + y ) ; printf ( " % d n " , x + z ) ; // OK // Error } 19 / 45
  • 20.
    # include "stdio . h " int main ( int int meter x int meter y int meter ^2 argc , char * argv []) { = 3.4 ; = 5.6 ; area = x * y ; printf ( " % d n " , x + y ) ; // OK // printf ("% d n " , x + z ) ; // Error } 20 / 45
  • 21.
    # include "stdio . h " int main ( int int x int y int argc , char * argv []) { = 3.4 ; = 5.6 ; area = x * y ; printf ( " % d n " , x + y ) ; // OK } Extensions of this form find errors, but otherwise are “erased” during translation. 21 / 45
  • 22.
    Extension composition Programmers canselect the extensions that they want. May want to use multiple extensions in the same program. Distinguish between 1. extension user has no knowledge of language design or implementations 2. extension developer must know about language design and implementation Tools build a custom .xc =⇒ .c translator for them How can that be done? 22 / 45
  • 23.
    Building translators fromcomposable extensible languages Two primary challenges: 1. composable syntax — enables building a scanner, parser context-aware scanning [GPCE’07] modular determinism analysis [PLDI’09] Copper 2. composable semantics — analysis and translations attribute grammars with forwarding, collections and higher-order attributes set union of specification components sets of productions, non-terminals, attributes sets of attribute defining equations, on a production sets of equations contributing values to a single attribute modular well-definedness analysis [SLE’12a] modular termination analysis [SLE’12b, Krishnan-PhD] Silver 23 / 45
  • 24.
    Generating parsers andscanners from grammars and regular expressions nonterminals: Stmt, Expr terminals: Id /[a-zA-Z][a-zA-Z0-9]*/ Num /[0-9]+/ Eq ’=’ Semi ’;’ Plus ’+’ Mult ’*’ Stmt ::= Stmt Semi Stmt Stmt ::= Id Eq Expr Expr ::= Expr Plus Expr Expr ::= Expr Mult Expr Expr ::= Id 24 / 45
  • 25.
    Stmt Stmt Id(x) Eq Semi Stmt Id(a) Expr Eq Expr Id(b) Expr Plus Expr Id(y) Expr Mult Num(3) Expr Id(z) Id(x), Eq, Id(y),Plus, Num(3), Mult, Id(z), Semi, Id(a), Eq, Id(b) “x = y + 3 * z ; a = b” 25 / 45
  • 26.
    Attribute Grammars add semantics— meaning — to context free grammars nodes (non-terminals) have attributes that is, semantic values Expr may be attributed with type - the type of the expression errors - list of error messages env - mapping variable names to their types Stmt may be attributed with errors and env 26 / 45
  • 27.
    ... errors=[ERROR]; Stmt env =[x→int, y→int, z→string] Stmt errors = [ ] Semi env = [x→int, y→int, z→string] Id(x) Eq Expr type = int; errors = [ ] Id(x) Stmt errors=[ERRO env = [x→in Eq Expr t=string env = [x→int, y→int, z→string] env = [ Id(z) Expr type = int; errors = [ ] Plus Expr env = [x→int, y→int, z→string] Id(y) Expr Num(3) Mult Expr type = int; errors = [ ] env = [x→int, y→int, z→st Id(y) 27 / 45
  • 28.
    Attribute grammar specifications Equationsassociated with productions define attribute values. abstract production addition e : : Expr : : = l : : Expr ’+ ’ r : : Expr { e . e r r o r s := l . e r r o r ++ r . e r r o r s ++ . . . c h e c k t h a t l and r a r e i n t e g e r s ... e . type = i n t ; l . env = e . env ; r . env = e . env ; } 28 / 45
  • 29.
    Modern attribute grammars higher-orderattributes reference attributes collection attributes forwarding module systems separate compilation etc. 29 / 45
  • 30.
    for-loop as anextension abstract production for s : : Stmt : : = i : : Name l o w e r : : Expr u p p e r : : Expr body : : Stmt { s . e r r o r s := l o w e r . e r r o r ++ u p p e r . e r r o r s ++ body . e r r o r s ++ . . . c h e c k t h a t i i s an i n t e g e r . . . forwards to // i=l o w e r ; w h i l e ( i <= u p p e r ) { body ; i=i +1;} seq ( assignment ( varRef ( i ) , lower ) , while ( l t e ( varRef ( i ) , upper ) , b l o c k ( s e q ( body , a s s i g n m e n t ( v a r R e f ( i ) , add ( v a r R e f ( i ) , i n t L i t ( ”1” ) ) ) ) ) ) ) ; } 30 / 45
  • 31.
    Building an attributegrammar evaluator from composed specifications. ... AG H ∪∗ {AG E1 , ..., AG En } ∀i ∈ [1, n].modComplete(AG H , AG Ei ) E E ⇒ ⇒ complete(AG H ∪ {AG1 , ..., AGn }) Monolithic analysis - not too hard, but not too useful. Modular analysis - harder, but required [SLE’12a]. 31 / 45
  • 32.
    Challenges in scanning Keywordsin embedded languages may be identifiers in host language: int SELECT ; ... rs = using c query { SELECT last name FROM person WHERE ... 32 / 45
  • 33.
    Challenges in scanning Differentextensions use same keyword connection c "jdbc:derby:./derby/db/testdb" with table person [ person id INTEGER, first name VARCHAR ]; ... b = table ( c1 : T F , c2 : F * ) ; 33 / 45
  • 34.
    Challenges in scanning Operatorswith different precedence specifications: x = 3 + y * z ; ... str = /[a-z][a-z0-9]*.java/ 34 / 45
  • 35.
    Challenges in scanning Terminalsthat are prefixes of others List<List<Integer>> dlist ; ... x = y >> 4 ; 35 / 45
  • 36.
    Need for context Traditionally,parser and scanner are disjoint. Scanner → Parser → Semantic Analysis In context aware scanning, they communicate Scanner Parser → Semantic Analysis 36 / 45
  • 37.
    Context aware scanning Scannerrecognizes only tokens valid for current “context” keeps embedded sub-languages, in a sense, separate Consider: chan in, out; for i in a { a[i] = i*i ; } Two terminal symbols that match “in”. terminal IN ’in’ ; terminal ID /[a-zA-Z ][a-zA-Z 0-9]*/ submits to {keyword }; terminal FOR ’for’ lexer class {keyword }; example is part of AbleP [SPIN’11] 37 / 45
  • 38.
    Parsing C asan extension to Promela c_decl { typedef struct Coord { int x, y; } Coord; c_state "Coord pt" "Global" int z = 3; } /* goes in state vector */ /* standard global decl */ active proctype example() { c_code { now.pt.x = now.pt.y = 0; }; do :: c_expr { now.pt.x == now.pt.y } -> c_code { now.pt.y++; } :: else -> break od; c_code { printf("values %d: %d, %d,%dn", Pexample->_pid, now.z, now.pt.x, now.pt.y); 38 / 45
  • 39.
    Context aware scanning Thisscanning algorithm subordinates the disambiguation principle of maximal munch to the principle of disambiguation by context. It will return a shorter valid match before a longer invalid match. In List<List<Integer>> before “>”, “>” in valid lookahead but “>>” is not. A context aware scanner is essentially an implicitly-moded scanner. There is no explicit specification of valid look ahead. It is generated from standard grammars and terminal regexs. 39 / 45
  • 40.
    With a smarterscanner, LALR(1) is not so brittle. We can build syntactically composable language extensions. Context aware scanning makes composable syntax “more likely” But it does not give a guarantee of composability. 40 / 45
  • 41.
    Building a parserfrom composed specifications. ... CFG H ∪∗ {CFG E1 , ..., CFG En } ∀i ∈ [1, n].isComposable(CFG H , CFG Ei )∧ conflictFree(CFG H ∪ CFG Ei ) ⇒ ⇒ conflictFree(CFG H ∪ {CFG E1 , ..., CFG En }) Monolithic analysis - not too hard, but not too useful. Modular analysis - harder, but required [PLDI’09]. Non-commutative composition of restricted LALR(1) grammars. 41 / 45
  • 42.
  • 43.
    Expressiveness versus safecomposition Compare to other parser generators libraries The modular compositionality analysis does not require context aware scanning. But, context aware scanning makes it practical. 43 / 45
  • 44.
    Future Work ableC -extensible C11 specification builds on lessons learned from extensible specifications of Java [ECOOP’07], Lustre [FASE’07], Modelica, Promela [SPIN’11]. incorporate existing language extensions composition of language extensions are compile-time language specific analysis new applications of AGs 44 / 45
  • 45.
    Thanks for yourattention. Questions? http://melt.cs.umn.edu evw@cs.umn.edu 45 / 45
  • 46.
    Eric Van Wykand August Schwerdfeger. Context-aware scanning for parsing extensible languages. In Intl. Conf. on Generative Programming and Component Engineering, (GPCE), pages 63–72. ACM, 2007. Eric Van Wyk, Derek Bodin, Jimin Gao, and Lijesh Krishnan. Silver: an extensible attribute grammar system. Science of Computer Programming, 75(1–2):39–54, January 2010. August Schwerdfeger and Eric Van Wyk. Verifiable composition of deterministic grammars. In Proc. of Conf. on Programming Language Design and Implementation (PLDI), pages 199–210. ACM, June 2009. 45 / 45
  • 47.
    Ted Kaminski andEric Van Wyk. Modular well-definedness analysis for attribute grammars. In Proc. of Intl. Conf. on Software Language Engineering (SLE), volume 7745 of LNCS, pages 352–371. Springer-Verlag, September 2012. Lijesh Krishnan and Eric Van Wyk. Termination analysis for higher-order attribute grammars. In Proceedings of the 5th International Conference on Software Language Engineering (SLE 2012), volume 7745 of LNCS, pages 44–63. Springer-Verlag, September 2012. Lijesh Krishnan. Composable Semantics Using Higher-Order Attribute Grammars. PhD thesis, University of Minnesota, Department of Computer Science and Engineering, 2012. http://purl.umn.edu/144010 45 / 45
  • 48.
    Yogesh Mali andEric Van Wyk. Building extensible specifications and implementations of Promela with AbleP. In Proc. of Intl. SPIN Workshop on Model Checking of Software, volume 6823 of LNCS, pages 108–125. Springer-Verlag, July 2011. Eric Van Wyk, Lijesh Krishnan, August Schwerdfeger, and Derek Bodin. Attribute grammar-based language extensions for Java. In Proc. of European Conf. on Object Oriented Prog. (ECOOP), volume 4609 of LNCS, pages 575–599. Springer-Verlag, 2007. 45 / 45
  • 49.
    Jimin Gao, MatsHeimdahl, and Eric Van Wyk. Flexible and extensible notations for modeling languages. In Fundamental Approaches to Software Engineering, FASE 2007, volume 4422 of LNCS, pages 102–116. Springer-Verlag, March 2007. 45 / 45