What I learned from Rascal

2,149 views
1,917 views

Published on

Rascal is a functional metaprogramming language and system designed for software analysis. M3 is a generic metamodel for representing code parsed in Rascal. In this talk I describe my experience learning Rascal, and using it to query and manipulate M3 models, and to transform them into FAMIX models that can be imported into the Moose analysis platform.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,149
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

What I learned from Rascal

  1. 1. What I learned from Rascal Oscar Nierstrasz Software Composition Group scg.unibe.ch SWAT talk, Oct 25 2013 SCG talk, Oct 29 2013 Monday, October 28, 13
  2. 2. Roadmap module basic::hello import IO; void hello () { ! println("hello"); } First Steps Warming up M3 to MSE Syntax, Islands and Water Monday, October 28, 13 The Good, the Bad ...
  3. 3. First Steps Monday, October 28, 13
  4. 4. What’s Rascal? Rascal is a purely functional metaprogramming language for source code analysis Rascal is a purely functional, model transformation language and language workbench. 4 Monday, October 28, 13
  5. 5. Free download; online help; stackoverflow; github issue tracking ... 5 Monday, October 28, 13
  6. 6. modules types functions module basic::hello import IO; void hello () { ! println("hello"); } Eclipse integration REPL Rascal is a purely functional language with java-ish syntax. You can run it from a shell, but it is really meant to be used within eclipse. 6 Monday, October 28, 13
  7. 7. Missing a gentle introduction Don’t read this first! Read these recipes Then the manual! The examples (“recipes”) are nice, but there should be more of them, and they should be introduced step-by-step as exercises. 7 Monday, October 28, 13
  8. 8. Warming up Monday, October 28, 13
  9. 9. Polymorphism detection First test case for Rascal: Can polymorphic call sites be more effectively detected statically (false positives) or dynamically (false negatives)? 1. Implement static heuristics based on M3 2. ... This first project was designed to see how well Rascal and M3 are suited to static and dynamic analysis. Since M3 does not go below the method level, only some simple heuristics were implemented. 3. ... 9 Monday, October 28, 13
  10. 10. What’s M3? M3 is a language-independent meta-model for software analysis Like FAMIX .. module analysis::m3::Core ... data M3 = m3(loc id); anno anno anno anno anno anno anno anno rel[loc name, loc src] M3@declarations; rel[loc name, TypeSymbol typ] M3@types; rel[loc src, loc name] M3@uses; rel[loc from, loc to] M3@containment; list[Message messages] M3@messages; rel[str simpleName, loc qualifiedName] M3@names; rel[loc definition, loc comments] M3@documentation; rel[loc definition, Modifier modifier] M3@modifiers; // // // // // // // // maps declarations to wher assigns types to declared maps source locations of what is logically contain error messages and warnin convenience mapping from comments and javadoc atta modifiers associated with ... Each annotation is a list or a relation ... 10 Monday, October 28, 13
  11. 11. M3 for Java extends M3 with Java specifics module lang::java::m3::Core ... extend analysis::m3::Core; ... anno anno anno anno anno anno rel[loc rel[loc rel[loc rel[loc rel[loc rel[loc from, from, from, from, from, from, loc loc loc loc loc loc to] to] to] to] to] to] M3@extends; M3@implements; M3@methodInvocation; M3@fieldAccess; M3@typeDependency; M3@methodOverrides; // // // // // // classes extending classes and interface classes implementing interfaces methods calling each other (including c code using data (like fields) using a type literal in some code (type which method override which other metho ... 11 Monday, October 28, 13
  12. 12. Rascal talks to Eclipse to build an M3 model M3 snakesM3 = createM3FromEclipseProject(|project://p2-SnakesAndLadders|); URIs (“locations”) m3(|project://p2-SnakesAndLadders|)[ @fieldAccess={ <|java+method://p2-SnakesAndLadders/snakes/Game/toString()|,|java+field://p2-SnakesAndLad <|java+method://p2-SnakesAndLadders/snakes/Game/addSquares(int)|,|java+field://p2-SnakesA ... }, @extends={ <|java+class://p2-SnakesAndLadders/snakes/Snake|,|java+class://p2-SnakesAndLadders/snakes <|java+class://p2-SnakesAndLadders/snakes/Ladder|,|java+class://p2-SnakesAndLadders/snake ... }, @methodInvocation={ <|java+method://p2-SnakesAndLadders/snakes/SimpleGameTest/move8jillWins(snakes.Game)|,|ja <|java+variable://p2-SnakesAndLadders/snakes/Game/toString()/buffer|,|java+constructor:// ... }, @typeDependency={ <|java+method://p2-SnakesAndLadders/snakes/Square/nextSquare()|,|java+primitiveType://p212 <|java+method://p2-SnakesAndLadders/snakes/DieTest/testMinReached()|,|java+class://p2-Sna Monday, October 28, 13
  13. 13. Most M3 queries are 1-liners Memoized function @doc { Return the set of (all) subtypes of a type. } @memo public set[loc] subtypes(M3 m, loc aType) = invert(getDeclaredTypeHierarchy(m)+)[aType]; Invert relation Transitive closure 13 Monday, October 28, 13
  14. 14. Many M3 relations are actually (partial functions) @doc { Returns the source URI for the method URI. } public loc getSource(M3 m, loc method) = ! getUniqueElement(m@declarations[method]); @doc { Returns unique element of a set, or fails. } private &T getUniqueElement(set[&T] s) { ! assert size(s) == 1; ! return getOneFrom(s); } Note the use of generics and assertions. 14 Monday, October 28, 13
  15. 15. Polymorphic candidates @doc { Return classes with subclasses and interfaces with >1 implementations. } public set[loc] polymorphTypes(M3 m) { ! set[loc] types = getDeclaredTypeHierarchy(m)<0>; ! return { t | t <- types, (isClass(t) && size(subtypes(m,t)) > 0) ! ! ! ! ! ! ! || (isInterface(t) && size(subtypes(m,t)) > 1) }; } @doc { Returns the type symbol for a given class loc. } public TypeSymbol getTypeSymbol(M3 m, loc t) = getUniqueElement(m@types[t]); @doc { Return fields declared to be of polymorphic types. } public set[loc] polymorphFields(M3 m) { ! set[TypeSymbol] ts = { getTypeSymbol(m,t) | t <- polymorphTypes(m) }; ! return { t | t <- invert(m@types)[ts], isField(t) }; } private loc squareField = |java+field://p2-SnakesAndLadders/snakes/Player/square|; test bool testPolymorphFields() = polymorphFields(snakes()) == { squareField }; Some heuristics are easy to express; others would require access to the AST ... Monday, October 28, 13 15
  16. 16. M3 to MSE https://github.com/onierstrasz/rascal-m3-to-mse.git Monday, October 28, 13
  17. 17. Idea: steal models from Rascal Especially interesting for languages other than Java! m3(|project://...|)[ @fieldAccess={ ... }, ... ( ! ! ! ! ... (FAMIX.Namespace (id: 182) ! (name 'snakes')) (FAMIX.Class (id: 8) ! (name 'DieTest') 17 Monday, October 28, 13
  18. 18. Need IDs for all FAMIX entities NB: locations or TypeSymbols entities we might need... @doc { Returns a map from FAMIX entity values to unique IDs. } @memo public map[value,int] idMap(M3 m) { ! set[value] entities = m@declarations<0> + m@declarations<1> // classes, methods, ... ! ! ! ! ! ! + primitiveTypes(m) // NB: TypeSymbols; all others are locations ! ! ! ! ! ! + importedTypes(m) ! ! ! ! ! ! + { unknownFieldType() } ! ! ! ! ! ! + m@methodInvocation<1> // external methods ! ! ! ! ! ! + m@fieldAccess<1> // external fields ! ! ! ! ! ! + m@extends + m@implements // inheritances ! ! ! ! ! ! + m@methodInvocation ! ! ! ! ! ! + m@fieldAccess It was an iterative process ! ! ! ! ! ! + importedPackages(m); to figure out what entities were needed ... ! return index(entities); } handy library function! Monday, October 28, 13 18
  19. 19. Short cut: directly spit out MSE @doc { Write the MSE for an Eclipse Java project to its source directory. } public void writeMSE(M3 m) { ! loc file = m.id + "<m.id.authority>.mse"; ! writeFile(file, "(n"); ! appendPackages(file, m); ! appendClasses(file, m); ... ! appendToFile(file, ")n"); } private void appendClasses(loc file, M3 m) { ! for (loc c <- classes(m) + interfaces(m) + anonClasses(m)) { ! ! appendToFile(file, ! ! ! "! (FAMIX.Class (id: <getID(m,c)>) ! ! ! '! ! (name '<getClassName(c)>') ! ! ! '! ! (container (ref: <getID(m, getClassPackage(m, c))>)) ! ! ! '! ! (isInterface <isInterface(c)?true:false>)) ! ! ! '"); ! ! // TODO: modifiers, sourceAnchor ... Directly spitting out MSE means there is no way to check the ! } consistency of the output } Rascal string templates Easy, but complicates debugging 19 Monday, October 28, 13
  20. 20. Debugging MSE is painful Moose did not make it easy to track down errors in the generated MSE. A script to post-check for dangling references helped. 20 Monday, October 28, 13
  21. 21. Missing abstractions were easy to build @doc { Return the package URI for a given class URI. } public loc getClassPackage(M3 m, loc c) { ! set[loc] parents = parents(m)[c]?{}; ! if (isEmpty(parents)) { ! ! return unknownPackage(c); ! } ! loc parent = getUniqueElement(parents); ! return isPackage(parent) ? parent : getClassPackage(m, parent); } One of few functions that weren’t 1-liners. 21 Monday, October 28, 13
  22. 22. private private private private private private private private private private private private private bool isPrimitive(int()) = true; bool isPrimitive(float()) = true; bool isPrimitive(double()) = true; bool isPrimitive(short()) = true; bool isPrimitive(boolean()) = true; bool isPrimitive(char()) = true; bool isPrimitive(byte()) = true; bool isPrimitive(long()) = true; bool isPrimitive(void()) = true; bool isPrimitive(null()) = true; bool isPrimitive(array(_,_)) = true; bool isPrimitive(typeParameter(_, _)) = true; default bool isPrimitive(TypeSymbol s) = false; public ! { ! + { ! + { set[TypeSymbol] primitiveTypes(M3 m) = t | t <- types(m), isPrimitive(t) } t | t <- returnTypes(m), isPrimitive(t) } t | t <- parameterTypes(m), isPrimitive(t) }; @doc { Return the ID of the type of a field. } public int declaredTypeID(M3 m, loc f) { ! try ! ! TypeSymbol ts = declaredTS(m, f); ! catch : ! ! return getID(m, unknownFieldType()); ! if (isPrimitive(ts)) { ! ! return getID(m, ts); ! } ! return getID(m, location(ts)); } Monday, October 28, 13 22
  23. 23. public set[loc] locations(class(loc decl, _)) = { decl }; public set[loc] locations(interface(loc decl, _)) = { decl }; public set[loc] locations(method(loc decl, _, _, _)) = { decl }; public set[loc] locations(constructor(loc decl, _)) = { decl }; public set[loc] locations(enum(loc decl)) = { decl }; public set[loc] locations(typeParameter(decl,_)) = { decl }; public set[loc] locations(object()) = { unknownFieldType() }; // TEMPORARY HACK public default set[loc] locations(TypeSymbol _) = {}; test bool testLocations1() = locations(int()) == {}; test bool testLocations2() = locations(playerTS) == {playerClass}; test bool testLocations3() = locations(setSquareTS) == {setSquare}; public loc location(TypeSymbol ts) { ! assert(!isPrimitive(ts)); ! return getUniqueElement(locations(ts)); } @doc { Returns the locations of a set of TypeSymbols. } public set[loc] locationsOf(set[TypeSymbol] tsSet) { ! return { location(ts) | ts <- tsSet, !isPrimitive(ts) }; } 23 Monday, October 28, 13
  24. 24. @doc { Returns imported classes and interfaces, i.e., used, but not declared. } public set[loc] importedTypes(M3 m) = ! usedTypes(m) ! + superTypes(m) ! + locationsOf(returnTypes(m)) ! + locationsOf(parameterTypes(m)) ! - m@declarations<0>; public set[loc] superTypes(M3 m) = m@extends<1> + m@implements<1>; public set[loc] usedTypes(M3 m) = ! ! { decl | class(decl, _) <- types(m)} ! ! + { decl | interface(decl, _) <- types(m)}; @doc { Return the return type of a method. If primitive, returns the TypeSymbol. } public value returnType(M3 m, loc meth) { ! try { ! ! TypeSymbol ts = ! ! ! getUniqueElement({rt | method(_, _, TypeSymbol rt, _) <- typeOf(m)[meth]}); ! ! return isPrimitive(ts) ? ts : location(ts); ! } ! catch : ! ! return unknownFieldType(); } 24 Monday, October 28, 13
  25. 25. :set profiling true @doc { Return map of declarations; memoized for performance. } @memo private map[loc, set[loc]] sourceLocMap(M3 m) = toMap(m@declarations); @doc { Memoize conversion to map for performance.} @memo public map[loc,set[loc]] parents(M3 m) = toMap(invert(m@containment)); @doc { Return type(s) of an entity; memoized map for performance. } @memo private map[loc,set[TypeSymbol]] typeOf(M3 m) = toMap(m@types); @memo public set[loc] externalFields(M3 m) = m@fieldAccess<1> - fields(m); public bool isExternalField(M3 m, loc f) = f in externalFields(m); @memo public set[loc] externalMethods(M3 m) = m@methodInvocation<1> - methods(m); public bool isExternalMethod(M3 m, loc meth) = meth in externalMethods(m); rascal>:set profiling true ok rascal>writeMSE(sm); PROFILE: 124 data points, 472 ticks, tick Source File Ticks rascal://Set 40 rascal://Set 36 rascal://lang::java::m3::Core 29 rascal://m3::M3toMSE 25 rascal://lang::java::m3::Core 21 rascal://Relation 19 Monday, October 28, 13 rascal://Relation 18 Memoizing selected functions and converting certain relations to maps improved performance 60x! (20 sec vs 20 min for one run). = 1 milliSecs % Source 8.5% |rascal://Set|(4059,1,<148,18>,<148,19>) 7.6% |rascal://Set|(4051,5,<148,10>,<148,15>) 6.1% |rascal://lang::java::m3::Core|(2758,69,< 5.3% |rascal://m3::M3toMSE|(10252,1,<342,54>,< 4.4% |rascal://lang::java::m3::Core|(2925,32,< 25 4.0% |rascal://Relation|(10157,15,<457,2>,<457 3.8% |rascal://Relation|(10164,8,<457,9>,<457,
  26. 26. Rascal in Moose 26 Monday, October 28, 13
  27. 27. Syntax, Islands, and Water https://github.com/onierstrasz/rascal-islands.git Monday, October 28, 13
  28. 28. Easy peasy MSE parser start syntax Famix = "(" Entity* ")" ; syntax Entity = "(" EntityName EntityID Attribute* ")" ; Nothing tricky here syntax Attribute = "(" AttributeName Value+ ")" ; syntax Value = String | Boolean | Number | EntityRef ; ... 28 Monday, October 28, 13
  29. 29. Idea: island parser for structure Several false starts (grammar errors, ambiguity) start syntax Code = Stuff+ ; syntax Stuff = String | Char | Comment | Word | Noise | Paren // flat, no structure ; Start with a flat island parser The idea is to ignore everything except parentheses and curly braces to infer as much as possible about the structure of an unknown language. Idea proposed by Patrick Viry 29 Monday, October 28, 13
  30. 30. start syntax Code = code: Stuff* ; syntax Stuff = Water | Island ; syntax Water = String | Char | Comment | Noise ; syntax Island = Word | Struct ; syntax Struct   = round: "(" Code ")"   | curly: "{" Code "}"   | square: "[" Code "]"   ; Then “graduate” to a structured island parser Since getting the syntactic elements right is hard, start with a flat parser and then introduce the structure. 30 Monday, October 28, 13
  31. 31. Use toy and real code to debug 31 Monday, October 28, 13
  32. 32. Homing in on syntax errors @doc { Binary search to find smallest sublist of lines giving a parse error. } private list[str] binSearchErrs(type[&T<:Tree] begin, list[str] input) { ! ... ! assert(low+high == input); ! try ! ! parse(begin, intercalate("n",low)); ! catch : Parse errors often did not give enough ! ! return binSearchErrs(begin, low); context. By automating a binary search, ! try minimal examples could be found and turned into tests cases. ! ! parse(begin, intercalate("n",high)); (This worked for the flat parser.) ! catch : ! ! return binSearchErrs(begin, high); ! return input; // failed to find a substring with the error } |project://rascal-clone/src/org/rascalmpl/library/util/SystemAPI.java| IList r = readLines(a, "`", """, """, """, "<", "<", ">", ">"); 32 Monday, October 28, 13
  33. 33. Ambiguities hard to fix ... rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Player.java|); bool: true rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Game.java|); bool: true rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Ladder.java|); bool: true lexical Comment rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/SimpleGameTest.java|); ! = "/*" (![*] | [*] !>> [/])* "*/" bool: true ! | "//" ![n]* ! ; rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Die.java|); bool: true lexical Word rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Square.java|); = word: [a-zA-Z_][a-zA-Z0-9_-]* !>> [a-zA-Z0-9_-] bool: true ; rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/DieTest.java|); syntax Noise bool: true = NoiseChar+ ; lexical NoiseChar = ![a-zA-Z_(){}[]"'] | "/" !>> [*/] ; Somewhere here some ambiguity lurks, but it is hard to track down with the current tools ... 33 Monday, October 28, 13
  34. 34. Visualization identified test cases The renderParsetree() library function helped to home in on the problematic cases. 34 Monday, October 28, 13
  35. 35. Water is hard! lexical Noise // numbers and operators = (![a-zA-Z_(){}[]"'/])+ !>> ![a-zA-Z_(){}[]"'/] | "/" !>> [*/] ; This worked, but it took a lot of effort to come up with this rule. (One fatal error was that Noise was declared as syntax rather than lexical.) 35 Monday, October 28, 13
  36. 36. Contextual analysis ... ? public void countWords(loc project) { ! list[str] allWords = ! ! ( [] | it + words(parse(#start[Code], src).top) | src <- toList(javaFiles(project)) ); ! for (<n,k> <- sort(countStrings(allWords))) ! ! println(<n,k>); } rascal>countWords(|project://p2-SnakesAndLadders|); <1,"(JExample)"> <1,"(class)"> <1,"Die"> <1,"DieTest"> <1,"FirstSquare"> ... <25,"{(int)}"> <32,"{{this}}"> <33,"{{(game)}}"> <35,"{{game}}"> <37,"{{(position)}}"> <38,"{{assertEquals}}"> <52,"{{return}}"> <68,"{public}"> ok 36 Monday, October 28, 13
  37. 37. The Good, the Bad ... Monday, October 28, 13
  38. 38. Debugging grammars Debugging failed tests Misplaced syntax errors 38 Monday, October 28, 13
  39. 39. The lack of OO caused me some culture shock. I felt that functions that applied to certain data types should have been methods. I also missed an OO layer around the M3 models. Tutor needs work! declaredType Type TypedEntity name '…' PrimitiveType name '...' isStub BOOL previous Invocation signature '…' Class parentType name '...' modifiers '…' '...' subclass element superclass ClassMember modifiers '...' receiver candidates element previous Inheritance Attribute sourceAnchor FileAnchor startLine N endLine N fileName '...' sourceAnchor sender Method cyclomaticComplexity N kind '…' numberOfStatements N signature '…' variable accessor previous Access Oh no, no OO! Parameter parentBehaviouralEntity vs. m3(|project://...|)[ @fieldAccess={...}, @extends={...}, @methodInvocation={...}, @typeDependency={...}, @messages=[...], @containment={...}, @names={...}, @implements={...}, @documentation={...}, @uses={...}, @methodOverrides={...}, @types={...}, @modifiers={...}, @declarations={...} ] 39 Monday, October 28, 13
  40. 40. Tests Compact functional style public set[TypeSymbol] parameterTypes(M3 m) = ( {} | it + e | e <- { toSet(pt) | method(_, _, _, list[TypeSymbol] pt) <- types(m) }); Integrated testing was very handy. The compact functional style led to lots of 1-liners. Profiling made optimization very easy. Locations made code navigation easy. Feedback was quick offline, but getting live help was even better! Profiling rascal>:set profiling true ok Locations rascal>writeMSE(sm); PROFILE: 124 data points, 472 ticks, tick = 1 milliSecs Source File Ticks % Source rascal://Set 40 8.5% |rascal://Set|(4059,1,<148,18>,<148,19>) rascal://Set 36 7.6% |rascal://Set|(4051,5,<148,10>,<148,15>) ... Libraries Live and offline help! Thanks! 40 Monday, October 28, 13

×