Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Domain-Specific Tooling


Published on

Although software development entails many activities closely intertwined with the domain of the system under development, present-day IDEs offer mostly generic features targeting programming in general. In this talk we look at three different ways in which software development can be tailored to be more domain-specific, and thus, more effective. First, we consider "agile modeling" -- how to rapidly model complex and heterogeneous software with the help of a new approach to robust parsing; second, we look at "architectural monitoring" by showing how a unified domain-specific architectural modeling language can serve as a front-end to a range of different kinds of architectural constraints and checking tools; and finally, we consider "agile tooling" -- how to rapidly create or adapt development tools to suit a given application domain. These activities are part of "Agile Software Assessment", a research project funded by the Swiss NSF.
Presented at INRIA Lille, 2014-11-20

Published in: Technology
  • Be the first to comment

Domain-Specific Tooling

  1. 1. Domain-Specific Tooling Oscar Nierstrasz Software Composition Group Colloquium Polaris — 2014-11-20
  2. 2. Roadmap Agile Software Assessment Agility in Moose Agile Modeling Moldable Tools Architectural Monitoring
  3. 3. Agile Software Assessment
  4. 4. 4 Especially with OO, where the code does not reflect run time behaviour well, more time is spent reading than writing, to understand code and to understand impact of change. IDEs do not well support this reading activity, since they focus on PL concepts, like editing, compiling and debugging, not architectural constraints, or user features. Developers spend more time reading than writing code
  5. 5. Important aspects of the model are often missing in the code. This makes it harder to make sure that changes are consistent. Architecture and particularly architectural constraints are typically not explicit. The programming language may get in the way — boilerplate code can obfuscate intent. Dependencies are often hidden, so it can be unclear what will be the impact of a change. Furthermore, the development context and project history is not part of the code at all. 55 There is a gap between Models and Code
  6. 6. Although the architecture is one of the most important artifacts of a system to understand, it is not easily recoverable from code. This is because: (1) a system may have many architectures (eg layered and thin client), (2) there are many kinds of architecture static, run-time, build etc), (3) PLs do not offer any support to encode architectural constraints aside from coarse structuring and interfaces. 6 The architecture ... is not in the code
  7. 7. 7 Analysis tool are pretty limited to programming tasks. We have built many specialized tools, such as to mine features, extract architecture, or analyze evolution, but these tools are expensive to build. We need an IDE with better tools, and offering a better meta-tooling environment (tools to build tools quickly). Specialized analyses require custom tools
  8. 8. Agility in Moose
  9. 9. 9 Moose is a platform for software and data analysis Moose is a platform for modeling software artifacts to enable software analysis. Moose has been developed for well over a decade. It is the work of dozens of researchers, and has been the basis of numerous academic and industrial projects.
  10. 10. 10 System complexity - Clone evolution view Class blueprint - Topic Correlation Matrix - Distribution Map for topics spread over classes in packages Hierarchy Evolution view - Ownership Map
  11. 11. Mondrian Demo Mondrian is a tool to script Moose visualizations using embedded domain-specific language. Essentially it is a component framework for visualization with a fluent interface. Usually classes with similar names belong to same subsystem. We check if classes with similar names belong to same package. We visually cluster the classes based on the similarity of their names, and then we color each class with a unique color based on the identity of the parent namespace. Demo: visualizing name cohesion within packages Meyer et al. Mondrian: An Agile Visualization Framework. SoftVis 2006. DOI: 10.1145/1148493.1148513
  12. 12. Agile Modeling
  13. 13. Moose is a platform for software and data analysis, but the bottleneck is the development of importers for different languages to the FAMIX metamodel. Development can take weeks or months. 13 Orion DSM BugMap ... Navigation Metrics Querying Grouping Smalltalk Smalltalk Java Python C++ … Extensible meta model Model repository Moose is a powerful tool once we have a model … Roassal Nierstrasz et al. The Story of Moose. ESEC/FSE 2005. DOI: 10.1145/1095430.1081707
  14. 14. 14 Challenge Load the model in the morning, analyze it in the afternoon The key bottleneck to assessment is creating a suitable model for analysis. If a tool does not already exist, it can take days, weeks or months to parse source files and generate models.
  15. 15. 15 Problems Heterogeneous projects Unknown languages Unstructured text Developing a parser for a new language is a big challenge. Parsers may be hard to scavenge from existing tools. Not only source code, but other sources of information, like bug reports and emails can be invaluable for model building. Few projects today are built using a single language. Often a GPL is mixed with scripting languages, or even home-brewed DSLs.
  16. 16. Grammar Stealing was introduced by Verhoef and Lämmel. The other approaches we have tried, but they all have limitations … Ideas Grammar Stealing this phase will be a model of the Ruby software system. As the meta-model FAME compliant, also the model will be. Information about the ClassLoader, instance responsible for loading Java classes, is covered in section 4.7. The Fame framework automatically extracts a model from an instance of an Eclipse AST. This instance corresponds to the instance of the Ruby plugin AST representing the software system. Automation is possible due to the fact that defined the higher level mapping. Figure 2.1 reveals the need for the higher mapping to be restored. In order to implement the next phase independently from the environment used in this phase we extracted the model into an MSE file. Recycling Trees Figure 2.1: The dotted lines correspond to the extraction of a (meta-)model. The other arrows between the model and the software system hierarchy show which Java tower level corresponds to which meta-model tower element. 2.3 Model Mapping by Example phase Our previously extracted model still contains platform dependent information and thus is not a domain specific model for reverse engineering. It could be used by very specific or very generic reverse engineering tools, as it contains Hooking into an existing tool concrete syntax tree of the software system only. However such tools do exist. In the Model Mapping by Example phase we want to transform the model into a FAMIX compliant one. With such a format it will be easier to use several software engineering tools. The idea behind this approach relies on Parsing by Example [3]. Parsing Example presents a semi-automatic way of mapping source code to domain 9 The final part is reproduction, i.e. to generate a new generation from the surviving pre-vious generation. For that purpose an evolutionary algorithm usually uses two types of genetic operators: point mutation and crossover (We will refer to point mutations as mutations, although crossover is technically also a mutation). Mutations change an individual in a random location to alter it slightly, thus generating new information. Crossover1 however, takes at least two individuals and cuts out part of one of them, to put it in the other individual(s). By only moving around information, Crossover does not introduce new information. Be aware that every modification of an individual has to result in a new individual that is valid. Validity is very dependent on the search space - it generally means that fitness function as well as the genetic operators should be applicable to a valid individual. A schematic view is shown in fig. 3.1. generate new random population select most fit individuals fit enough? generate new population with genetic operators mutation crossover Figure 3.1: Principles of an Evolutionary Algorithm Evolutionary There are alternatives to rejecting a certain number of badly performing individuals per generation. To compute the new generation, one can generate new individuals from all individuals of the old generation. This would not result in an improvement since the selection is completely random. Hence the parent individuals are selected Grammar Generation 1Crossover in biology is the process of two parental chromosomes exchanging parts of genes in the meiosis (cell division for reproduction cells) Parsing by Example 16
  17. 17. 17 Agile Modeling Lifecycle Build a coarse model Build a custom analysis Refine the model We don’t need a full parser; just enough to start analysis. Then we can refine the model to extract more details.
  18. 18. 18 Idea: use island grammars to extract coarse models 'class' ID (method / . {avoid})* 'end' method? method . {avoid} class Shape int x; int y; method draw() … end end method main() … end Island grammars allow us to extract just parts of the information from source code that interest us at the moment.
  19. 19. 19 Problem: island grammars lead to shipwrecks class Shape method end Tweaking island grammars till they work is not an option … 'class' ID (method / !'end' !method)* 'end' method? Unfortunately island grammars are very difficult to get right because the rules for water depend on the islands. If the islands of interest change, the water must change too.
  20. 20. 20 A Bounded Sea searches for an island in a bounded scope 'class' ID (~method~)* 'end' method? ~method~ method ~method~ Jan Kurs, et al. Bounded Seas: Island Parsing Without Shipwrecks. SLE 2014. DOI: 10.1007/978-3-319-11245-9_4 Bounded seas essentially eliminate the need to write special rules for water, since the boundaries are inferred from the islands. We are now starting to explore how well this works for real languages …
  21. 21. Architectural Monitoring
  22. 22. 22 Challenge “What will my code change impact?” Large software systems are so complex that one can never be sure until integration whether certain changes can have catastrophic effects at a distance. Ideas: Tracking Software Architecture; exploiting Big Software Data
  23. 23. 23 Problems Diverse views of SA SA is not in the code The IDE focuses on code
  24. 24. 24 Ideas Architecture monitoring (beyond dependencies) Uncovering “Software Architecture in the Wild” To effectively support architectural monitoring, we need to understand SA in the wild. Initial studies show which kinds of constraints are important in practice but are poorly supported by tools. An architectural dashboard would be a configurable tool that allows you to specify architectural constraints and monitor compliance.
  25. 25. What is SA in the Wild? The theory seems to suggest that SA is mainly about structure and dependencies. Our experience with actual projects suggested that the truth might be different. We carried out a couple of empirical studies, first a qualitative one to understand what is SA in the wild, and then a second, quantitative one to see to what extent various kinds of constraints appear in practice. Andrea Caracciolo, et al. How Do Software Architects Specify and Validate Quality Requirements? Software Architecture 2014. DOI: 10.1007/978-3-319-09970-5_32
  26. 26. 26 Impact of SA constraints constraint Impact (1-5) availability 4.2 response-time 4.0 authorization 3.9 authentication 3.6 communication 3.4 throughput 3.4 signature 3.4 software infrastructure 3.3 data integrity 3.3 recoverability 3.1 dependencies 3.1 visual design 3.0 data retention policy 3.0 hardware infrastructure 2.9 system behavior 2.9 data structure 2.9 event handling 2.9 code metrics 2.7 meta-annotation 2.6 naming conventions 2.6 file location 2.5 accessibility 2.5 software update 2.2 In the quantitative study we asked developers how important different kinds of architectural constraints were for their projects. Interestingly, in the top ten, there were significantly more user constraints, like availability (in green) than developer constraints (in blue). Dependencies were only halfway down the list.
  27. 27. Automated Validation is not Prevalent authorization throughput response-time data retention policy authentication data integrity visual design code quality meta-annotation accessibility communication availability event handling data structure software infrastructure signature dependencies recoverability software update hardware infrastructure file location naming conventions Avg: 40% As we see, on average, QRs are automatically tested only 40% of the time. 0% 25% 50% 75% 100%
  28. 28. Formalization is not Prevalent dependencies signature data structure meta-annotation naming conventions event handling authorization data integrity communication visual design code metrics file location availability response-time throughput data retention policy authentication software infrastructure recoverability accessibility hardware infrastructure software update Avg: 20% On average QRs are formally specified only 20 % of the time. Practitioners use different formalisms: from UML+profile to regex ER, UML + profile Regex, BNF annotations … 0% 25% 50% 75% 100%
  29. 29. Architectural Rules Naming Conventions “Repository interfaces can only declare methods named find..()” Dependencies “Only Service classes are allowed to throw AppException” Performance “The rendering operation has to be completed in less than 4ms” AC: “One year ago I had the chance to talk to various professionals working in the area where I study. What I noticed was that part of an architectural specification consists of constraints and guidelines on how a system should behave and be implemented”
  30. 30. Rule Validation xml java uml Limited functionality Poor usability What we typically do is to test these rules using the most appropriate tool.
  31. 31. Dicto — a unified ADSL Andrea Caracciolo, et al. Dicto: A Unified DSL for Testing Architectural Rules. ECSAW '14. DOI: 10.1145/2642803.2642824 We have a single unified spec. language which can be used to define a wide range of rules and can exploit off-the-shelf tools for verifying them. We call this language Dicto.
  32. 32. Dicto Rules … MyService : Website with url=“” MyService must HandleLoadFrom("10 users") MyService cannot HaveResponseTimeLessThan(“1000 ms") MyService can only HandleSOAPMessages() … DICTO looks like this. Two types of statements: 1. entity definition: used to identify concrete elements of the system 2. rules: express the condition that we want to test through one of the supported tools
  33. 33. Periodic Validation The goal would be to have a fully automated and periodic process for checking architectural rules
  34. 34. Rule Examples Website response time Website load testing Dependencies Code clones Deadlock freeness File Content grep At the moment we have a working implementation that supports these kinds of rules …
  35. 35. Moldable Tools
  36. 36. 36 Challenge Build a new assessment tool in ten minutes Custom analyses require custom tools. Building a tool should be as easy as writing a query in SQL or a form-based interface.
  37. 37. 37 Problems What tools do developers really need? What are appropriate meta-tools? Developers are often forced to use whatever tools are at hand to address assessment tasks. Few researchers have studied developers needs. Tools for querying, navigation, metrics computation, visualization etc need to play nicely together. This implies a need for a unifying meta-model. Meta-tools are tools for building tools. Should these be DSLs, GPLs, graphical tools, or what? What is a unifying meta-model for tool construction?
  38. 38. 38 Ideas Analyze developer needs (!) “Moldable” Tools (not just plug-ins) We need more work on understanding developers needs. Just as a well-designed API can serve as an internal DSL, meta-models can in principle help to drive an interactive interface for constructing models-based tools. Modern IDEs support extension by plug-ins, but they can only affect the behaviour of the IDE in limited ways. We imagine an IDE that can be scripted to build new kinds of tools and analyses.
  39. 39. 39 Conventional debuggers just offer an interface to the run-time stack.
  40. 40. 40 Specific Models Mind the abstraction gap Generic Debugger Domain-specific Debuggers Classical development tools like browsers, debuggers and inspectors are generic and do not address the needs of specific domains. The Moldable debugger can be easily adapted to different domains, such as event-driven computation, GUI construction and parser generation. The Moldable Debugger Debugging Widget Action * Activation Predicate Debugging Andrei Chis et al. The Moldable Debugger: A Framework for Developing Domain-Specific Debuggers. SLE 2014. DOI: 10.1007/978-3-319-11245-9_6
  41. 41. PetitParser identifier letter , (letter / digit) * , letter * / PetitParser is a PEG-based framework for letter digit developing parsers composed of objects.
  42. 42. 42 IdentifierParser new parse: 'aLong32Identifier'
  43. 43. 43 The conventional debugger knows nothing about the parsing domain.
  44. 44. 44 A moldable PP debugger knows which objects are parsers, knows where we are in the input, and can show us which parser object is currently active.
  45. 45. Domain specific-extensions Debugging Widget Debugging View * Debugging * Action Debugging Session Debugging Predicate Primitive Predicate HighLevel Predicate * Activation Predicate Moldable debuggers are built up from debugging widgets and debugging actions. The moldable debugger uses activation predicates to know which debuggers can currently be activated, allowing the developer to switch between debuggers without starting a new session.
  46. 46. Next parser Next production Production(aproduction) Next failure Stream position(anInteger) Stream position changed 46 Debugging widgets Debugging actions The parts from which the PP debugger is built.
  47. 47. Petit Parser Events SUnit Glamour 47 Moldable debuggers have been built for several different domains already.
  48. 48. New debuggers are cheap Although some expertise is required to build a new debugger, the development effort for a new debugger is tiny.
  49. 49. The Moldable Inspector The moldable inspector extends these ideas to object inspectors. Here is a moldable inspector for PostGres databases. We are exploring other kinds of moldable tools …
  50. 50. Conclusion Current IDEs offer developers primitive support for software assessment Developers need support for agile modeling, architectural monitoring and moldable tools The IDE does not do an adequate job of supporting developers in decision-making tasks. We need to better understand needs of developers and to offer a better meta-tooling environment to support agile modeling, customization and continuous assessment. NB: we leave out discussion of context here