Your SlideShare is downloading. ×
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Scam 2013 — Agile Modeling Keynote
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scam 2013 — Agile Modeling Keynote

915

Published on

Keynote presentation for 13th IEEE International Working Conference on Source Code Analysis and Manipulation — Eindhoven, The Netherlands, 22-23 September …

Keynote presentation for 13th IEEE International Working Conference on Source Code Analysis and Manipulation — Eindhoven, The Netherlands, 22-23 September 2013.
http://www.ieee-scam.org/2013/program.html

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
915
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Agile Modeling When can we have it? Oscar Nierstrasz Software Composition Group scg.unibe.ch SCAM 2013
  • 2. Roadmap Agile Modeling Agile Software Assessment Directions
  • 3. Agility Software Assessment
  • 4. Legacy code is hard to understand Real software systems change with time. Much of development is not new but maintenance, adaptation and extension of existing systems. But these systems are hard to understand as they tend to get more complex over time. 4
  • 5. The architecture Legacy code is hard to understand Although the architecture is one of the most important artifacts of a system to understand, it is not easily recoverable from code. This is because: (1) a system may have many architectures (eg layered and thin client), (2) there are many kinds of architecture static, run-time, build etc), (3) PLs do not offer any support to encode architectural constraints aside from coarse structuring and interfaces. ... is not in the code 5
  • 6. Legacy code is hard to understand The architecture is not in the code Especially with OO, where the code does not reflect run time behaviour well, more time is spent reading than writing, to understand code and to understand impact of change. IDEs do not well support this reading activity, since they focus on PL concepts, like editing, compiling and debugging, not architectural constraints, or user features. Developers spend more time reading than writing code 6
  • 7. Specialized analyses require custom tools Legacy code is hard to understand The architecture is not in the code Analysis tool are pretty limited to programming tasks. We have built many specialized tools, such as to mine features, extract architecture, or analyze evolution, but these tools are expensive to build. We need an IDE with better tools, and offering a better meta-tooling environment (tools to build tools quickly). Developers spend more time reading than writing code 7
  • 8. Legacy code is hard to understand The architecture is not in the code Developers spend more time reading than writing code Specialized analyses require custom tools 8
  • 9. Moose is a platform for software and data analysis Moose is a platform for modeling software artifacts to enable software analysis. Moose has been developed for well over a decade. It is the work of dozens of researchers, and has been the basis of numerous academic and industrial projects. www.moosetechnology.org 9
  • 10. ConAn Van Hapax ... CodeCrawler Smalltalk Extensible meta model Java COBOL C++ … Model repository External Parser MSE Navigation Metrics Querying Grouping Smalltalk The key to Moose’s flexibility is its extensible metamodel. Every tool that builds on Moose is able to extend the metamodel with concepts that it needs, such as history. Nierstrasz et al. The Story of Moose. ESEC/FSE 2005 10
  • 11. System complexity Lanza et al. Polymetric Views. TSE 2003 11
  • 12. System complexity - Clone evolution view Class blueprint - Topic Correlation Matrix - Distribution Map for topics spread over classes in packages Hierarchy Evolution view - Ownership Map 12
  • 13. ConAn Van Hapax ... CodeCrawler Smalltalk Java COBOL Extensible meta model Model repository Navigation C++ Metrics … Querying Grouping Smalltalk But, we have a huge bottleneck for new languages ... The bottleneck is to construct a parser to build the model. 13
  • 14. A ASA A What is Agile Software Assessment? 14
  • 15. Challenge “What will my code change impact?” Mircea’s vision Large software systems are so complex that one can never be sure until integration whether certain changes can have catastrophic effects at a distance. Ideas: Tracking Software Architecture; exploiting Big Software Data 15
  • 16. Challenge Build a new assessment tool in ten minutes Custom analyses require custom tools. Building a tool should be as easy as writing a query in SQL or a form-based interface. 16
  • 17. Challenge Load the model in the morning, analyze it in the afternoon The key bottleneck to assessment is creating a suitable model for analysis. If a tool does not already exist, it can take days, weeks or months to parse source files and generate models. 17
  • 18. Agile Modeling
  • 19. Agile Modeling Lifecycle Build a coarse model Refine the model Build a custom analysis 19
  • 20. Problems Unknown languages Heterogeneous projects Unstructured text Developing a parser for a new language is a big challenge. Parsers may be hard to scavenge from existing tools. Not only source code, but other sources of information, like bug reports and emails can be invaluable for model building. Few projects today are built using a single language. Often a GPL is mixed with scripting languages, or even home-brewed DSLs. 20
  • 21. Hooking into an existing tool Ideas of this phase will be a model of the Ruby software system. As the meta-model is FAME compliant, also the model will be. Information about the ClassLoader, an instance responsible for loading Java classes, is covered in section 4.7. The Fame framework automatically extracts a model from an instance of an Eclipse AST. This instance corresponds to the instance of the Ruby plugin AST representing the software system. Automation is possible due to the fact that we defined the higher level mapping. Figure 2.1 reveals the need for the higher mapping to be restored. In order to implement the next phase independently from the environment used in this phase we extracted the model into an MSE file. 18 CHAPTER 3. GENETIC PROGRAMMING Since biological evolution starts from an existing population of species, we need to bootstrap an initial population before we can begin evolving it. This initial population is generally a number of random individuals. These initial individuals usually don’t perform well, although some will already be a tad better than others. That is exactly what we need to get evolution going. Recycling Trees Grammar Stealing The final part is reproduction, i.e. to generate a new generation from the surviving previous generation. For that purpose an evolutionary algorithm usually uses two types of genetic operators: point mutation and crossover (We will refer to point mutations as mutations, although crossover is technically also a mutation). Mutations change an individual in a random location to alter it slightly, thus generating new information. Crossover1 however, takes at least two individuals and cuts out part of one of them, to put it in the other individual(s). By only moving around information, Crossover does not introduce new information. Be aware that every modification of an individual has to result in a new individual that is valid. Validity is very dependent on the search space - it generally means that fitness function as well as the genetic operators should be applicable to a valid individual. A schematic view is shown in fig. 3.1. Figure 2.1: The dotted lines correspond to the extraction of a (meta-)model. The other arrows between the model and the software system hierarchy show generate new which Java tower level corresponds to randommeta-model tower element. which population fit enough? 2.3 Model Mapping by Example phase Our previously extracted model still contains platform dependent information generate new select most fit and thus is not a domain specific model for reverse engineering. It could be population with individuals genetic used by very specific or very generic reverse engineering tools, as it contains operators the concrete syntax tree of the software system only. However such tools do not exist. In the Model Mapping by Example phase we want to transform the model into a FAMIX compliant one. With such a format it will be easier to use in several software engineering tools. The idea behind this approach relies on Parsing by Example [3].mutation Parsing crossover by Example presents a semi-automatic way of mapping source code to domain Figure 3.1: Principles of an Evolutionary Algorithm Parsing by Example 9 Evolutionary Grammar Generation There are alternatives to rejecting a certain number of badly performing individuals per generation. To compute the new generation, one can generate new individuals from all individuals of the old generation. This would not result in an improvement since the selection is completely random. Hence the parent individuals are selected 1 Crossover in biology is the process of two parental chromosomes exchanging parts of genes in the meiosis (cell division for reproduction cells) 21
  • 22. Hooking into an existing tool 22
  • 23. Nice if you can have it — but just defers the problem to another platform 23
  • 24. Grammar Stealing 24
  • 25. 25
  • 26. No Start Compiler sources? Hard-coded parser Yes Yes No General rules Yes Recover the grammar Recover the grammar BNF No Language reference manual? Quality? One case: perl Quality? Constructions by example No cases known Yes Recover the grammar One case: RPG No No cases known Figure 2. Coverage diagram for grammar stealing. stance). Figure 2 shows the first two cases— the third is just a combination. If you start with a hard-coded grammar, you must reverse-engineer it from the handwritten code. Fortunately, the comments of such code of- ple, through general rules, or by both approaches. If a manual uses general rules, its quality is generally not good: reference manuals and language standards are full of errors. It is our experience that the myriad Still takes a couple of weeks and lots of expertise 26
  • 27. hase will be a model of the Ruby software system. As the meta-model compliant, also the model will be. Information about the ClassLoader, nce responsible for loading Java classes, is covered in section 4.7. Fame framework automatically extracts a model from an instance of an AST. This instance corresponds to the instance of the Ruby plugin AST ing the software system. Automation is possible due to the fact that ed the higher level mapping. Figure 2.1 reveals the need for the higher to be restored. In order to implement the next phase independently environment used in this phase we extracted the model into an MSE Recycling Trees Daniel Langone. Recycling Trees: Mapping Eclipse ASTs to Moose Models. Bachelor's thesis, University of Bern 27
  • 28. 1.Infer AST implementation from IDE plugin 2.Extract metamodel from plugin 3.Map model elements to FAMIX (Moose) 28
  • 29. Hard to recognize ASTs; still need to map to model elements AST implementations always different, hard to generalize. 29
  • 30. Parsing by Example Nierstrasz et al. ExampleDriven Reconstruction of Software Models. CSMR 2007 30
  • 31. parse 5 Source code Parser import specify examples 2 1 5 4 generate ... := ... ... ... ... | ... ... ... ... | ... ... ... ... ... := ... ... ... ... | ... ... ... ... | ... ... ... ... ... := ... ... ... ... | ... ... ... ... | ... ... ... ... infer 3 Example mappings export Grammar 6 m..n Model 1 31
  • 32. CodeSnooper 32
  • 33. 33
  • 34. Inlel del er, onic) of to atree nd del of be od orle, solve these problems. retrieve the reference model by inspecting the source code In a third iteration we add examples to recognize atmanually. tributes. Once again we obtain three parsers based on three In Ruby there are Classes and Modules. Modules are sets of examples for abstract classes, concrete classes and collectionsWe obtain theand Constants. They cannot genof Methods following results: interfaces. erate instances. However they can be mixed into Classes and other Modules. A Module cannot inherit from anything. Precise Model Our Model Modules also have the function of Namespaces. Ruby does Number of Model Classes 366 346 not supportof Abstract Classes Number Abstract Classes [22]. 233 230 For the definition of the scanner tokens for identifiers and Total Number Of Methods 1887 1780 comments we use the following regular expressions: Total Number of Attributes 395 304 <IDENTIFIER >: [ a zA Z $ ] w⇤ ( ? | ! ) ? ; This process be repeated]to cover <comment>: can# JBoss case more and more of [ ˆ r n ⇤ <eol> ; the subject language. The question on when to stop can be answered with “When the results are good enough”.classes, Using just 2 examples each of namespaces, Good enough in this context we are able we have of the 22 files. methods and attributes,means when to parse 7enough information for a specific reverse engineering task. For example, a “System Complexity View” [18] is a 7 files Our used to Precise Model visualization Model obtain an initial impression of a legacy software system. To Number of generate such a view we need to parse a significant number Namespaces 8 6 6 of the classes, identify subclass relations, and establish the Number of numbers of methods and attributes of each class. Even if we Model Classes 25 4 4 parse only 80% of the code, we can still get an initial imTotal Number of pression of the state of the system. If on the other we would Methods 247 26 want to display a “Class Blueprint” [17],26semantically ena Total Number of riched visualization of the internal structure of classes we Attributes 136 9 would need a refined grammar to extract 9 more information. The “good enough” is thus given by the reverse engineering Amongst the vary fromRuby case there are 4 large files goals, which files we could notcase. case to parse, containing GUI code. If we ignore these files, we are able to detect about 25% of the target elements. There are two main reasons that so few files can be successfully parsed: methods and attributes, we are able to parse 7 of the 22 fi Precise Model Number of Namespaces Number of Model Classes Total Number of Methods Total Number of Attributes 7 files 8 Our Mod 6 6 25 4 Problems 4 26 26 • Ambiguity247 136 9 • False positives 9 • False negatives Amongst the files we could not parse, there are 4 large fi containing GUI code. If we ignore these files, we are a • Embedded languages to detect about 25% of the target elements. There are two main reasons that so few files can be s cessfully parsed: 1. The comment character # occurs frequently in stri and regular expressions, causing our simple-min One major failure was to use SmaCC, scanner to fail. whichbetter with a separate scanner. this pr A is LALR scanner would fix Better use a parser and lem. With some simplescannerlessGLR Parsing. preprocessing (removing PEGs, Earley parsing or hash character that occurs inside a string and remov all comments) we can improve recall to 65-85%. 2. Ruby offers a very rich syntax for control constru allowing the same keywords to occur in many diffe positions and contexts. One would by many m Markus Kobel. Parsing need examples to recognize these constructs. Example. MSc, U Bern, April 2005. 34
  • 35. Evolutionary Grammar Generation 18 CHAPTER 3. GENETIC PROGRAMMING Since biological evolution starts from an existing population of species, we need to bootstrap an initial population before we can begin evolving it. This initial population is generally a number of random individuals. These initial individuals usually don’t perform well, although some will already be a tad better than others. That is exactly what we need to get evolution going. The final part is reproduction, i.e. to generate a new generation from the surviving previous generation. For that purpose an evolutionary algorithm usually uses two types of genetic operators: point mutation and crossover (We will refer to point mutations as mutations, although crossover is technically also a mutation). Mutations change an individual in a random location to alter it slightly, thus generating new information. Crossover1 however, takes at least two individuals and cuts out part of one of them, to put it in the other individual(s). By only moving around information, Crossover does not introduce new information. Be aware that every modification of an individual has to result in a new individual that is valid. Validity is very dependent on the search space - it generally means that fitness function as well as the genetic operators should be applicable to a valid individual. A schematic view is shown in fig. 3.1. generate new random population fit enough? select most fit individuals generate new population with genetic operators mutation crossover Figure 3.1: Principles of an Evolutionary Algorithm There are alternatives to rejecting a certain number of badly performing individuals per generation. To compute the new generation, one can generate new individuals from all individuals of the old generation. This would not result in an improvement since the selection is completely random. Hence the parent individuals are selected Sandro De Zanet. Grammar Generation with Genetic Programming — Evolutionary Grammar Generation. MSc, U Bern, July 2009. 35
  • 36. not introduce new information. Be aware that every modification of an individual has to result in a new individual that is valid. Validity is very dependent on the search space - it generally means that fitness function as well as the genetic operators should be applicable to a valid individual. A schematic view is shown in fig. 3.1. generate new random population fit enough? select most fit individuals generate new population with genetic operators mutation crossover Figure 3.1: Principles of an Evolutionary Algorithm There are alternatives to rejecting a certain number of badly performing individuals per generation. To compute the new generation, one can generate new individuals from all individuals of the old generation. This would not result in an improvement 36
  • 37. e more common mutations that affect the structure of the gramndent on the node. They will only affect not primitive and not PEG mutation and crossover omly generated parser will be inserted in the listFigure 4.2: Add back link node of children (fig. 4.1. TUNING PEGS FOR GENETIC PROGRAMMING ... C A 26 4.1. TUNING PEGS FOR GENETIC PROGRAMMING Figure 4.1: Add a node B CHAPTER 4. COMBINATION OF PEGS AND GENETIC PROGRAMMING ... 25 Figure 4.3: Delete a node C C rly to adding a child. Although in this case the new parser is not To one of the evolvability already existing ated but selected from ensure the nodes of the of more complex parsers we need more complex mutations. After the CFG population B in a link to this parser (like ainitialrule, fig. 4.2) got sorted mostly only single character parsers were left A A B and couldn’t mutate to parsers with more nodes. omly selected child will be removed. No effect, if there is only The following mutations no the possibility k that we don’t allow composite parsers withaddchildren, since to insert nodes between the current parser Insert a node Figure 4.5: and the root parser: tute a valid grammar (4.3) deletion The selected parser first moves all its children to the parent parser, thus ect on unary or primitive parsers like the character parser. 4.1.3 Fitness function A B replacing itself by its children (fig. 4.4) insertion The selected parser is replaced by a composite parser (sequence or important part of the evo The fitness function is the most choice). Figureselected back link node added to the new parser.4.4: Push a envisioned goal. It determines the q The 4.2: Add parser is then directly definingresults in the insertion Figure This the node up of a new parser in between the selected representation ofparser. If the selected parser and its the distance to the solution. Without an ela parser is the root, the new parser becomessearch will head(fig. 4.5) the new root. in the wrong direction. There is also the ri 37 Insertion ensures diversitysolutions depth. Graph depth local extrema. the emergence in graph or to be trapped in is important for
  • 38. Desired grammar: ([a-z] (‘_‘ | [0-9] | [a-z])*) Found grammar: (([a-z] ({‘n‘ | ‘_‘ | [0-9]})*))* Slow and expensive. Modest results for complex languages. Desired grammar: 0 -> (‘c‘ ‘a‘ ‘t‘ ‘:‘ ‘ ‘ ([a-z])+ 1 -> {2 -> (‘n‘ 0) | e}) Found grammar: (0 -> (‘c‘ ‘a‘ ‘t‘ ‘:‘ ‘ ‘ 2 -> (([a-z])+ ‘n‘)))+ Desired grammar: 0 -> (‘+‘ | ‘-‘ | ‘<‘ | ‘>‘ | ‘,‘ | ‘.‘ | 1 -> (‘[‘ 2 -> (0)* ‘]‘)) Found grammar: (0 -> {‘<‘ | ‘]‘ | ‘.‘ | ‘,‘ | ‘>‘ | ‘-‘ | ‘[‘ | ‘+‘})* 38
  • 39. Directions
  • 40. Incrementally refine island grammars 40
  • 41. Progress Verification of a scope is expensive Global Islands are useful Scoped Islands are useful 1: If we search for method island “globally” we don’t know to which class it belongs 2: At each step, we have to verify whether we have found the end of scope — that is also expensive! 3: If you have an island containing the same island (Class with inner classes) you get indefinite recursive descent: The same states for repeating “scoped” islands (Class followed by another class in the same file, something like class*) The end of “class” scope is another “class” and first thing we have to do is to check end of scope, which leads to parsing the same “class”, which has to check end of scope leading to parsing the same class “class”.... It is maybe hard to describe, but it is true... Recursive Islands and Scoped Repeating Islands lead to LeftRecursion Problems Tokenization and memoization help :-) 41
  • 42. Composing parsers from parts 42
  • 43. Ideas Exploit similarities between languages (adapt and compose) • similar syntax, similar constructs • combine “reusable” islands? • adapt with genetic algorithms? • combine with parsing by example? 43
  • 44. Automatic structure recognition 44
  • 45. Ideas catch, throw, try, bool, class, enum, explicit, export, friend, inline, mutable, namespace, operator, private, protected, public, template, typename, using, virtual, volatile, wchar_t, and, and_eq, bitand, bitor, compl, const_cast, delete, dynamic_cast, false, new, not, not_eq, or, or_eq, reinterpret_cast, static_cast, this, true, typeid, xor, xor_eq Heuristics to automatically detect keywords Exploit indentation as a proxy for structure • don’t parse; just recognize structure? • combine with parsing by example? • combine with grammar evolution? • combine with parser composition? 45
  • 46. Conclusions Model construction is an obstacle to agile software assessment You don’t need precise parsers to start analysis Are there effective shortcuts to building a parser/importer by hand?

×