talk at Virginia Bioinformatics Institute, December 5, 2013

4,557 views

Published on

Extensible domain-specific programming for the sciences

The notion of scientists as programmers begs the question of what sort of programming language would be a good fit. The common answer seems to be both none of them and all of them. Many scientific applications are a combination of general-purpose and domain-specific languages: R for statistical elements, MATLAB for matrix-based computations, Perl-based regular expressions for string matching, C or FORTRAN for high performance parallel computations, and scripting languages such as Python to glue them all together. This clumsy situation demonstrates the need for different domain-specific language features.

Our hypothesis is that programming could be made easier, less error-prone and result in higher-quality code if languages could be easily extended, by the programmer, with the domain-specific features that a programmer or scientists needs for their particular task at hand. This talk demonstrates the meta-language processing tools that support this composition of programmer-selected language features, with several extensions chosen from the previously mentioned list of features.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,557
On SlideShare
0
From Embeds
0
Number of Embeds
4,283
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

talk at Virginia Bioinformatics Institute, December 5, 2013

  1. 1. Extensible domain-specific programming for the sciences Eric Van Wyk University of Minnesota VBI, December 5, 2013 slides available at http:www.cs.umn.edu/~evw 1 / 45
  2. 2. Current trends / topics in PL Formal verification CompCert - http://compcert.inria.fr/ Astr´e - http://www.astree.ens.fr/ e Hoare logic (1960’s) {P} code {Q} Proof assistants: Coq, Abella, Isabelle, ... use required in some PL publishing venues 2 / 45
  3. 3. 3 / 45
  4. 4. 4 / 45
  5. 5. Current trends / topics in PL Parallel programming - multiple cores, everywhere. “no more free lunch” need new abstractions: e.g. Cilk, MapReduce, FP new semantics: e.g. deterministic parallel Java 5 / 45
  6. 6. Current trends / topics in PL Expressive and safe static typing extending richer static types, e.g. append :: ( [a], [a] ) -> [a] to dependent types append :: ( [a|n], [a|m] ) -> [a|n+m] turns array out-of-bounds and null-pointer bugs into static type errors 6 / 45
  7. 7. Extensible languages Allow programmers select the features to be used in their programming languages. new syntax / notations new semantic analyses / error-checking Why would anyone want to do that? 7 / 45
  8. 8. Programming language features General purpose features assignment statements, loops, if-then-else statements functions (perhaps higher-order) and procedures I/O facilities modules data: integer, strings, arrays, records Domain-specific features matrix operations (MATLAB) regular expression matching (Perl, Python) statistics functions (R) computational geometry operations (LN) parallel computing (SISAL, X10, NESL, etc.) Many similarities, needless differences. Working with multiple (domain-specific) languages is a headache. 8 / 45
  9. 9. Extensible languages Allow programmers select the features to be used in their programming languages. new syntax / notations new semantic analyses / error-checking Pick a general purpose host language (e.g. ANSI C), extend with domain-specific features. myProgram.xc =⇒ myProgram.c 9 / 45
  10. 10. Regular expressions # include " stdio . h " # include " regex . h " int main ( int argc , char * argv []) { char * text = readFileContents ( " X . data " ) ; // eukaryotic messenger RNA sequences regex foo = /^ ATG [ ATGC ]{3 ,10} A {5 ,10} $ / ; if ( text =~ foo ) printf ( " Matches ... n " ) ; else printf ( " Doesn ’t match ... n " ) ; } 10 / 45
  11. 11. Mining Climate Data - Ocean Eddies Spinning pools of water Transport heat, salt, and nutrients Learning about their behavior is difficult 11 / 45
  12. 12. A time slice for a point in the ocean 12 / 45
  13. 13. main ( int argc , char ** argv ) { Matrix float <3 > data = readMatrix ( " ssh . data " ) ; Matrix float <3 > scores = matrixMap ( scoreTS , data , [2]) ; writeMatrix ( " temporalScores . data " , scores ) ; } 13 / 45
  14. 14. Matrix float <1 > scoreTS ( Matrix float <1 > ts ) { int i = 0 , beginning , n = dimSize ( ts , 0) ; Matrix float <1 > scores = init ( Matrix float <1 > , dimSize ( ts , 0) ) ; while ( ts [ i ] < ts [ i +1]) { i = i +1 ; } Matrix float [0] trough ; while ( i < n -1) { ( trough , beginning , i ) = getTrough ( ts , i ) ; scores [ beginning :: i ] = computeArea ( trough ) ; } return scores ; } 14 / 45
  15. 15. Matrix float <1 > computeArea ( Matrix float <1 > areaOfInterest ) { float y1 = areaOfInterest [0]; float y2 = areaOfInterest [ end ]; int x1 = 0; int x2 = dimSize ( areaOfInterest ,0) -1; float m = ( y1 - y2 ) / (( float ) ( x1 - x2 ) ) ; float b = y1 - m * x1 ; Matrix float <1 > Line = ( x1 :: x2 ) * m + b ; float area = with ( x1 <= i < x2 ) fold (+ , 0.0 , line - areaOfInterest ) ; return with ( 0 <= i < dimSize ( Line ,0) ) genarray ([ dimSize ( Line , 0) ] , area ) ; } 15 / 45
  16. 16. ( Matrix float <1 > , int , int ) getTrough ( Matrix float <1 > ts , int i ) { int beginning = i ; int n = dimSize ( ts , 0) ; while ( i +1 < n && ts [ i ] >= ts [ i +1]) i = i +1; while ( i +1 < n && ts [ i ] < ts [ i +1]) i = i +1; return ( ts [ beginning :: i ] , beginning , i ) ; } 16 / 45
  17. 17. Matrix extensions several features from MATLAB with, fold, and genarray from Single Assignment C all translated down to expected C code straightforward parallel implementations of matrixMap, with, fold, and genarray. 17 / 45
  18. 18. Dimension analysis pound-seconds = newton-seconds 18 / 45
  19. 19. # include " stdio . h " int main ( int int meter x int meter y int meter ^2 argc , char * argv []) { = 3.4 ; = 5.6 ; area = x * y ; printf ( " % d n " , x + y ) ; printf ( " % d n " , x + z ) ; // OK // Error } 19 / 45
  20. 20. # include " stdio . h " int main ( int int meter x int meter y int meter ^2 argc , char * argv []) { = 3.4 ; = 5.6 ; area = x * y ; printf ( " % d n " , x + y ) ; // OK // printf ("% d n " , x + z ) ; // Error } 20 / 45
  21. 21. # include " stdio . h " int main ( int int x int y int argc , char * argv []) { = 3.4 ; = 5.6 ; area = x * y ; printf ( " % d n " , x + y ) ; // OK } Extensions of this form find errors, but otherwise are “erased” during translation. 21 / 45
  22. 22. Extension composition Programmers can select the extensions that they want. May want to use multiple extensions in the same program. Distinguish between 1. extension user has no knowledge of language design or implementations 2. extension developer must know about language design and implementation Tools build a custom .xc =⇒ .c translator for them How can that be done? 22 / 45
  23. 23. Building translators from composable extensible languages Two primary challenges: 1. composable syntax — enables building a scanner, parser context-aware scanning [GPCE’07] modular determinism analysis [PLDI’09] Copper 2. composable semantics — analysis and translations attribute grammars with forwarding, collections and higher-order attributes set union of specification components sets of productions, non-terminals, attributes sets of attribute defining equations, on a production sets of equations contributing values to a single attribute modular well-definedness analysis [SLE’12a] modular termination analysis [SLE’12b, Krishnan-PhD] Silver 23 / 45
  24. 24. Generating parsers and scanners from grammars and regular expressions nonterminals: Stmt, Expr terminals: Id /[a-zA-Z][a-zA-Z0-9]*/ Num /[0-9]+/ Eq ’=’ Semi ’;’ Plus ’+’ Mult ’*’ Stmt ::= Stmt Semi Stmt Stmt ::= Id Eq Expr Expr ::= Expr Plus Expr Expr ::= Expr Mult Expr Expr ::= Id 24 / 45
  25. 25. Stmt Stmt Id(x) Eq Semi Stmt Id(a) Expr Eq Expr Id(b) Expr Plus Expr Id(y) Expr Mult Num(3) Expr Id(z) Id(x), Eq, Id(y), Plus, Num(3), Mult, Id(z), Semi, Id(a), Eq, Id(b) “x = y + 3 * z ; a = b” 25 / 45
  26. 26. Attribute Grammars add semantics — meaning — to context free grammars nodes (non-terminals) have attributes that is, semantic values Expr may be attributed with type - the type of the expression errors - list of error messages env - mapping variable names to their types Stmt may be attributed with errors and env 26 / 45
  27. 27. ... errors=[ERROR]; Stmt env = [x→int, y→int, z→string] Stmt errors = [ ] Semi env = [x→int, y→int, z→string] Id(x) Eq Expr type = int; errors = [ ] Id(x) Stmt errors=[ERRO env = [x→in Eq Expr t=string env = [x→int, y→int, z→string] env = [ Id(z) Expr type = int; errors = [ ] Plus Expr env = [x→int, y→int, z→string] Id(y) Expr Num(3) Mult Expr type = int; errors = [ ] env = [x→int, y→int, z→st Id(y) 27 / 45
  28. 28. Attribute grammar specifications Equations associated with productions define attribute values. abstract production addition e : : Expr : : = l : : Expr ’+ ’ r : : Expr { e . e r r o r s := l . e r r o r ++ r . e r r o r s ++ . . . c h e c k t h a t l and r a r e i n t e g e r s ... e . type = i n t ; l . env = e . env ; r . env = e . env ; } 28 / 45
  29. 29. Modern attribute grammars higher-order attributes reference attributes collection attributes forwarding module systems separate compilation etc. 29 / 45
  30. 30. for-loop as an extension abstract production for s : : Stmt : : = i : : Name l o w e r : : Expr u p p e r : : Expr body : : Stmt { s . e r r o r s := l o w e r . e r r o r ++ u p p e r . e r r o r s ++ body . e r r o r s ++ . . . c h e c k t h a t i i s an i n t e g e r . . . forwards to // i=l o w e r ; w h i l e ( i <= u p p e r ) { body ; i=i +1;} seq ( assignment ( varRef ( i ) , lower ) , while ( l t e ( varRef ( i ) , upper ) , b l o c k ( s e q ( body , a s s i g n m e n t ( v a r R e f ( i ) , add ( v a r R e f ( i ) , i n t L i t ( ”1” ) ) ) ) ) ) ) ; } 30 / 45
  31. 31. Building an attribute grammar evaluator from composed specifications. ... AG H ∪∗ {AG E1 , ..., AG En } ∀i ∈ [1, n].modComplete(AG H , AG Ei ) E E ⇒ ⇒ complete(AG H ∪ {AG1 , ..., AGn }) Monolithic analysis - not too hard, but not too useful. Modular analysis - harder, but required [SLE’12a]. 31 / 45
  32. 32. Challenges in scanning Keywords in embedded languages may be identifiers in host language: int SELECT ; ... rs = using c query { SELECT last name FROM person WHERE ... 32 / 45
  33. 33. Challenges in scanning Different extensions use same keyword connection c "jdbc:derby:./derby/db/testdb" with table person [ person id INTEGER, first name VARCHAR ]; ... b = table ( c1 : T F , c2 : F * ) ; 33 / 45
  34. 34. Challenges in scanning Operators with different precedence specifications: x = 3 + y * z ; ... str = /[a-z][a-z0-9]*.java/ 34 / 45
  35. 35. Challenges in scanning Terminals that are prefixes of others List<List<Integer>> dlist ; ... x = y >> 4 ; 35 / 45
  36. 36. Need for context Traditionally, parser and scanner are disjoint. Scanner → Parser → Semantic Analysis In context aware scanning, they communicate Scanner Parser → Semantic Analysis 36 / 45
  37. 37. Context aware scanning Scanner recognizes only tokens valid for current “context” keeps embedded sub-languages, in a sense, separate Consider: chan in, out; for i in a { a[i] = i*i ; } Two terminal symbols that match “in”. terminal IN ’in’ ; terminal ID /[a-zA-Z ][a-zA-Z 0-9]*/ submits to {keyword }; terminal FOR ’for’ lexer class {keyword }; example is part of AbleP [SPIN’11] 37 / 45
  38. 38. Parsing C as an extension to Promela c_decl { typedef struct Coord { int x, y; } Coord; c_state "Coord pt" "Global" int z = 3; } /* goes in state vector */ /* standard global decl */ active proctype example() { c_code { now.pt.x = now.pt.y = 0; }; do :: c_expr { now.pt.x == now.pt.y } -> c_code { now.pt.y++; } :: else -> break od; c_code { printf("values %d: %d, %d,%dn", Pexample->_pid, now.z, now.pt.x, now.pt.y); 38 / 45
  39. 39. Context aware scanning This scanning algorithm subordinates the disambiguation principle of maximal munch to the principle of disambiguation by context. It will return a shorter valid match before a longer invalid match. In List<List<Integer>> before “>”, “>” in valid lookahead but “>>” is not. A context aware scanner is essentially an implicitly-moded scanner. There is no explicit specification of valid look ahead. It is generated from standard grammars and terminal regexs. 39 / 45
  40. 40. With a smarter scanner, LALR(1) is not so brittle. We can build syntactically composable language extensions. Context aware scanning makes composable syntax “more likely” But it does not give a guarantee of composability. 40 / 45
  41. 41. Building a parser from composed specifications. ... CFG H ∪∗ {CFG E1 , ..., CFG En } ∀i ∈ [1, n].isComposable(CFG H , CFG Ei )∧ conflictFree(CFG H ∪ CFG Ei ) ⇒ ⇒ conflictFree(CFG H ∪ {CFG E1 , ..., CFG En }) Monolithic analysis - not too hard, but not too useful. Modular analysis - harder, but required [PLDI’09]. Non-commutative composition of restricted LALR(1) grammars. 41 / 45
  42. 42. 42 / 45
  43. 43. Expressiveness versus safe composition Compare to other parser generators libraries The modular compositionality analysis does not require context aware scanning. But, context aware scanning makes it practical. 43 / 45
  44. 44. Future Work ableC - extensible C11 specification builds on lessons learned from extensible specifications of Java [ECOOP’07], Lustre [FASE’07], Modelica, Promela [SPIN’11]. incorporate existing language extensions composition of language extensions are compile-time language specific analysis new applications of AGs 44 / 45
  45. 45. Thanks for your attention. Questions? http://melt.cs.umn.edu evw@cs.umn.edu 45 / 45
  46. 46. Eric Van Wyk and August Schwerdfeger. Context-aware scanning for parsing extensible languages. In Intl. Conf. on Generative Programming and Component Engineering, (GPCE), pages 63–72. ACM, 2007. Eric Van Wyk, Derek Bodin, Jimin Gao, and Lijesh Krishnan. Silver: an extensible attribute grammar system. Science of Computer Programming, 75(1–2):39–54, January 2010. August Schwerdfeger and Eric Van Wyk. Verifiable composition of deterministic grammars. In Proc. of Conf. on Programming Language Design and Implementation (PLDI), pages 199–210. ACM, June 2009. 45 / 45
  47. 47. Ted Kaminski and Eric Van Wyk. Modular well-definedness analysis for attribute grammars. In Proc. of Intl. Conf. on Software Language Engineering (SLE), volume 7745 of LNCS, pages 352–371. Springer-Verlag, September 2012. Lijesh Krishnan and Eric Van Wyk. Termination analysis for higher-order attribute grammars. In Proceedings of the 5th International Conference on Software Language Engineering (SLE 2012), volume 7745 of LNCS, pages 44–63. Springer-Verlag, September 2012. Lijesh Krishnan. Composable Semantics Using Higher-Order Attribute Grammars. PhD thesis, University of Minnesota, Department of Computer Science and Engineering, 2012. http://purl.umn.edu/144010 45 / 45
  48. 48. Yogesh Mali and Eric Van Wyk. Building extensible specifications and implementations of Promela with AbleP. In Proc. of Intl. SPIN Workshop on Model Checking of Software, volume 6823 of LNCS, pages 108–125. Springer-Verlag, July 2011. Eric Van Wyk, Lijesh Krishnan, August Schwerdfeger, and Derek Bodin. Attribute grammar-based language extensions for Java. In Proc. of European Conf. on Object Oriented Prog. (ECOOP), volume 4609 of LNCS, pages 575–599. Springer-Verlag, 2007. 45 / 45
  49. 49. Jimin Gao, Mats Heimdahl, and Eric Van Wyk. Flexible and extensible notations for modeling languages. In Fundamental Approaches to Software Engineering, FASE 2007, volume 4422 of LNCS, pages 102–116. Springer-Verlag, March 2007. 45 / 45

×