P a g e | 1
Darren Lee
Generating IDE support for
Dynamic Languages
B.Sc. Computer Science
with Software Engineering
19th ...
P a g e | 2
“I certify that the material contained in this dissertation is my own work and does not contain
unreferenced o...
P a g e | 3
Abstract
This project introduces DLTGen; a system designed to generate IDE support for dynamic
languages from ...
P a g e | 4
Contents
1 Introduction..........................................................................................
P a g e | 5
3.6.2 Propose Action & Styling ..................................................................................
P a g e | 6
6.2.2 Features Tested ...........................................................................................
P a g e | 1
References
Appendices
Appendix A – Model Visualisation 55
Appendix B – JavaScript Generics 56
Appendix C – Spe...
P a g e | 2
1 Introduction
1.1 Overview
Programming languages are always evolving, but recently a lot of focus has been pu...
P a g e | 3
companies hire large teams of developers to create great IDEs, but for a single developer,
researcher or a sma...
P a g e | 4
Another key point in the uniqueness of this project is that it can be used for full scale general
purpose lang...
P a g e | 5
2 Background
Chapter 2 discusses the background of the problem to help understand where the technical
difficul...
P a g e | 6
2.1.2 Duck Typing
Duck typing is a more flexible form of dynamic typing. Duck typing is only concerned with
wh...
P a g e | 7
The variable bob and paul are both instances of the “Person” class. In JavaScript it is
perfectly valid to add...
P a g e | 8
Figure 2.7, Eval Example
The first example simply takes a message and generates new code which will invoke the...
P a g e | 9
Figure 2.8, Closure Example
Constructs such as these also impose a need for some way to guide name binding lik...
P a g e | 10
A technology related to Xtext is Xpand2. Xpand is a code generation framework, used by
Xtext. It allows code ...
P a g e | 11
3 Design
The purpose of this chapter is to address what the system needs to be capable of doing and
which mec...
P a g e | 12
where language interactions are defined. Defining a complete workflow analysis however
does not lend itself w...
P a g e | 13
DLTGen. This data can be affected by and contain Meta data from static or runtime sources.
It maintains data ...
P a g e | 14
this is useful in Javascript where arguments can be put in between brackets. Static triggers
rarely get more ...
P a g e | 15
Figure 3.5, Tasks and Actions
To achieve each of these two tasks there are “actions”. An action is a well-def...
P a g e | 16
Figure 3.7, JavaScript Assignment Non-Terminal
It is intended to match runtime object alteration, that is to ...
P a g e | 17
3.5.2 Find Action
Once a Model is being maintained the data inside it needs to be accessible for that data to...
P a g e | 18
When searching inside a non-terminal it is also possible to direct the search. This is useful to
provide cust...
P a g e | 19
three non-terminals exist with the given attributes (grammar definitions for these can be
found in Appendix B...
P a g e | 20
The rules are as follows, a JS_ReturnType can be converted into a JS_GenericDef (found on
a JS_Class) if they...
P a g e | 21
The style action is used to describe how a proposal for a particular non-terminal should look
when it is disp...
P a g e | 22
4 Implementation
This chapter discusses, in more technical depth, how the various components of the system
co...
P a g e | 23
extensions. These manage the code as it is generated keeping track of variables, their data
types and if they...
P a g e | 24
For any non-terminal which defines this behaviour other actions can use their _name and
_path values, this is...
P a g e | 25
Figure 4.5, Member Resolution Visualization
First it uses the members of its expression that is what it was a...
P a g e | 26
Figure 4.7, Example Scope Search Rule
4.3 Framework
The framework in DLTGen is where all core functionality i...
P a g e | 27
Figure 4.10, Dot Typing Algorithm
4.3.2 Eclipse Integration Points
A key aim of DLTGen was to make creating a...
P a g e | 28
4.3.2.2 Content Assist Processor
Eclipse provides a lot more freedom for code completion invocation via a mec...
P a g e | 29
5 System in operation
This chapter demonstrates using the system to create an IDE for the language Ruby. It w...
P a g e | 30
bundles, create a specification xml file and add the DLTGen fragment to the generator
workflow. The created s...
P a g e | 31
Figure 5.3, Type Specifier Example
In the above example, any ClassDef non-terminal represents a class; this n...
P a g e | 32
Firstly, to detect the dollar symbol a trigger needs to be defined. It must be a model trigger
because it is ...
P a g e | 33
5.3.4 Simple Type Inference
In Ruby, the value of a variable can be determined by looking at how it is assign...
P a g e | 34
In the Ruby grammar a non-terminal called “LocalVariableRef” is defined, it represents a
reference to a local...
P a g e | 35
5.3.5 Advanced Type Inference on External Iterators
A particularly interesting and popular feature of Ruby is...
P a g e | 36
Figure 5.15, Yield Resolver
First it finds the MethodCall, the external iterator in the grammar is a child of...
P a g e | 37
6 Testing
This chapter focuses on the procedures undertaken to verify the stability and robustness of the
imp...
P a g e | 38
6.1.3 Results
Although features mostly worked as expected some problems did arise and were fixed. There
were ...
P a g e | 39
 Path Safety Checker, It would be too buggy to simply generate complicated
statements without considering th...
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Report (docx) - Home page - Lancaster University
Upcoming SlideShare
Loading in …5
×

Report (docx) - Home page - Lancaster University

2,876
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,876
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Report (docx) - Home page - Lancaster University

  1. 1. P a g e | 1 Darren Lee Generating IDE support for Dynamic Languages B.Sc. Computer Science with Software Engineering 19th March, 2010
  2. 2. P a g e | 2 “I certify that the material contained in this dissertation is my own work and does not contain unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Regarding the electronically submitted version of this submitted work, I consent to this being stored electronically and copied for assessment purposes, including the Department’s use of plagiarism detection systems in order to check the integrity of assessed work. I agree to my dissertation being placed in the public domain, with my name explicitly included as the author of the work.” Date: 19th March, 2009 Signed:
  3. 3. P a g e | 3 Abstract This project introduces DLTGen; a system designed to generate IDE support for dynamic languages from a higher level specification. The aims of DLTGen were to simplify two currently difficult tasks, firstly creating an IDE plug-in and secondly modelling a dynamic language. The final outcome of this project was evaluated by attempting to specify IDEs for languages not previously considered. DLTGen was able to support a range of complicated functionality in these languages. The power of DLTGen is that not only is it demonstrated creating tooling for smaller domain specific languages but also for full general purpose programming languages.
  4. 4. P a g e | 4 Contents 1 Introduction........................................................................................................................1 1.1 Overview .....................................................................................................................2 1.2 Motivation...................................................................................................................2 1.3 Aims of this project.....................................................................................................3 1.4 Approach.....................................................................................................................3 1.5 Unique aspects of the project ......................................................................................3 1.6 Report Overview .........................................................................................................4 2 Background ........................................................................................................................5 2.1 What is a dynamic language?......................................................................................5 2.1.1 Dynamic type systems .........................................................................................5 2.1.2 Duck Typing ........................................................................................................6 2.1.3 Runtime object alteration.....................................................................................6 2.1.4 Well defined global scope....................................................................................7 2.1.5 Code generation ...................................................................................................7 2.1.6 Closures................................................................................................................8 2.2 Existing Work .............................................................................................................9 2.2.1 Dynamic Language Tool Kit (DLTK) .................................................................9 2.2.2 Xtext.....................................................................................................................9 2.2.3 EMFText ............................................................................................................10 2.2.4 Visual Studio......................................................................................................10 3 Design ..............................................................................................................................11 3.1 Design Alternatives...................................................................................................11 3.2 Goals..........................................................................................................................12 3.3 Model ........................................................................................................................12 3.4 Detecting there was a request....................................................................................13 3.5 Determine sensible completions................................................................................14 3.5.1 Annotate Action.................................................................................................15 3.5.2 Find Action ........................................................................................................17 3.5.3 Link Action........................................................................................................18 3.5.4 Algorithm Action...............................................................................................20 3.6 Surface the completion to the IDE ............................................................................20 3.6.1 Insert Action.......................................................................................................20
  5. 5. P a g e | 5 3.6.2 Propose Action & Styling ..................................................................................20 4 Implementation ................................................................................................................22 4.1 Overview ...................................................................................................................22 4.2 Specification Language.............................................................................................23 4.2.1 Qualifier Task ....................................................................................................23 4.2.2 Annotate Task........................................................................................................24 4.2.3 Member Access..................................................................................................24 4.2.4 Scope Searching Task ........................................................................................25 4.3 Framework ................................................................................................................26 4.3.1 Special Algorithms.............................................................................................26 4.3.2 Eclipse Integration Points ..................................................................................27 4.4 Generator...................................................................................................................28 4.4.1 Path Resolution..................................................................................................28 4.4.2 Code Safety........................................................................................................28 5 System in operation..........................................................................................................29 5.1 Ruby Background......................................................................................................29 5.2 Getting Started...........................................................................................................29 5.2.1 Prerequisites.......................................................................................................29 5.2.2 Creating an Xtext project ...................................................................................29 5.2.3 Creating a DLTGen project................................................................................29 5.3 Specifying the language and IDE..............................................................................30 5.3.1 Global Scope......................................................................................................30 5.3.2 Type System & Literals .....................................................................................30 5.3.3 Global Variable Accessor ..................................................................................31 5.3.4 Simple Type Inference.......................................................................................33 5.3.5 Advanced Type Inference on External Iterators ................................................35 6 Testing..............................................................................................................................37 6.1 Framework ................................................................................................................37 6.1.1 Testing Procedure ..............................................................................................37 6.1.2 Features Tested ..................................................................................................37 6.1.3 Results................................................................................................................38 6.2 Generator...................................................................................................................38 6.2.1 Methodology..........................................................................................................38
  6. 6. P a g e | 6 6.2.2 Features Tested ..................................................................................................38 6.2.3 Results................................................................................................................39 7 Evaluation ........................................................................................................................40 7.1 Scala ..........................................................................................................................40 7.1.1 C Style Syntax....................................................................................................40 7.1.2 Local Type Inference .........................................................................................40 7.1.3 Object Orientated & Type System.....................................................................42 7.1.4 Higher-Order Functions .....................................................................................43 7.1.5 Polymorphic Methods........................................................................................43 7.1.6 Scala Findings....................................................................................................44 7.2 EOL...........................................................................................................................44 7.2.1 Model Access.....................................................................................................44 7.2.2 Implicit Typing ..................................................................................................45 7.2.3 Operations ..........................................................................................................45 7.2.4 Model Writing....................................................................................................46 7.2.5 EOL Findings.....................................................................................................46 8 Conclusion .......................................................................................................................47 8.1 Review of Aims.........................................................................................................47 8.1.1 IDE Generation..................................................................................................47 8.1.2 IDE Feature Support ..........................................................................................47 8.1.3 Dynamic Language Feature Support..................................................................48  Single focus type system...........................................................................................48 8.2 Future Work ..............................................................................................................49 8.2.1 Specification Language Format .........................................................................49 8.2.2 Eclipse Features .................................................................................................49 8.2.3 IDE Interoperability...........................................................................................49 8.2.4 Dynamic Language Research.............................................................................49 8.2.5 Expanding the Specification Language..............................................................50 8.3 Lessons Learned........................................................................................................50 Bibliography.............................................................................................................................52
  7. 7. P a g e | 1 References Appendices Appendix A – Model Visualisation 55 Appendix B – JavaScript Generics 56 Appendix C – Specification Language Section 57 Appendix D – JavaScript Runtime Object Alteration 58 Appendix E – Ruby DLTGen Specification 59 Appendix F – Ruby Grammar 60 Appendix G – Tests 61 Appendix H – Scala IDE Screenshots 62 Appendix I – Ruby IDE Screenshots 63 Appendix J – Original Project Proposal 64 Working documents can be found at: http://www.lancs.ac.uk/ug/leed2/
  8. 8. P a g e | 2 1 Introduction 1.1 Overview Programming languages are always evolving, but recently a lot of focus has been put on dynamic languages such as Javascript, Ruby and Python. John Ousterhout (1) made the argument over a decade ago that programming tasks are becoming more connection focused and that dynamic languages are better suited for this. With web technologies in particular this has become the case. Although a lot of work has been done and is being done on making dynamic languages better, an area where there is less work is creating tooling for dynamic languages. This project introduces DLTGen (dynamic language tooling generator) as a mechanism to simplify creating this tooling. Specifically this is achieved by providing a higher level description of the IDE (Integrated Development Environment). This benefits creating of tooling by removing some of the complexity of processing a dynamic language but also the complexity of creating a sophisticated IDE. In doing this there are a number of technical difficulties due to the fact that dynamic code tends to be fairly flexible. Unfortunately definitions of what constitutes a dynamic language are rather ambiguous. An abstract definition often used is a program which can change the program structure during runtime (2). This very open-ended ideal is difficult to classify but could include evaluating Strings as new code, updating the type system, extending object definitions among others. Some definitions state that a dynamic language should also have a dynamic type system (3) however many do not. Finally another definition gaining popularity is any language which is easy to use; many dynamic languages make claims of the productivity and learnability benefits (4) (1) (5). The ultimate outcome of this project is a mechanism to specify and generate features of an IDE and dynamic language processing. The code generated will provide a complete IDE plug-in. It will also support manual modifications to the code; this will enable DLTGen to be used as a complete solution or just a starting point for creating sophisticated IDEs. 1.2 Motivation There are two key motivations for this project: both processing dynamic languages and building a sophisticated IDE are difficult tasks. Processing a dynamic language is difficult because they are so flexible and sometimes it is not possible to achieve an accurate interpretation without fully executing the code. For example, a variable may change data type but only under a certain path of execution and in order to know the type we would need to know what path was followed, which would be too difficult to process without interpreting code. In Chapter 2 of this report several trickier dynamic language features are described and what makes them difficult to process. Today there are many great IDEs and most of them have a plug in architecture. However, most of the better ones have come to support such a large set of features that developers get lost in details. Of course, for big languages like Ruby and Actionscript the producing
  9. 9. P a g e | 3 companies hire large teams of developers to create great IDEs, but for a single developer, researcher or a small team it is much harder. To use a specific example, Eclipse (6) is an example of a particularly large IDE framework; there are many ways to do the same thing. A lot of the ‘getting-started’ documentation provided is too basic to create a sophisticated IDE and referencing what others have done often ends up leading one towards a solution which is now deprecated. There is a need for a simpler way to create sophisticated tooling to aid language researchers and developers. For example, developers often embed dynamic languages into systems to provide extensibility mechanisms. There are some great resources to help this however there are very few resources to help create tooling for their particular flavour of the source language. These toy languages are too small to invest substantial time and money into creating tooling. A higher level generated solution could make this more economically feasible. 1.3 Aims of this project The goals of this project are categorised into two areas, simplifying IDE creation and simplifying supporting dynamic languages. The aims are listed below.  Generate IDEs to simplify the process of tooling creation.  Specify basic and sophisticated IDE features.  Support dynamic language features. Specifically the aim of this project is to successfully develop a specification language for IDE features and build a generator to create an IDE from it. This tool could then be used to evaluate the projects effectiveness in creating reasonably sophisticated IDEs for new dynamic languages. 1.4 Approach The goals of this project are large and complex, it is beyond the scope and time requirements of the project to take a traditional approach - carefully researching dynamic languages and designing detailed abstractions. Instead this project took a different, iterative, approach. Firstly; hand written IDEs for two popular dynamic languages - Javascript and Ruby - were created. From these hand-crafted IDEs, abstractions for the specification language were created and the generator built. To validate the generator the two original handmade IDEs were re-specified in the new specification language. Finally the system was evaluated for new languages. The languages targeted in evaluation were intentionally not considered in designing the abstractions in order to yield better evaluation. However, researching the area of dynamic languages inevitably influenced how generic the abstractions were kept. 1.5 Unique aspects of the project The core unique aspect of this project is its aim to specify sophisticated IDE support in a higher level language. Being able to abstractly define an IDE significantly reduces the work load associated with creating tooling. By providing such a mechanism DLTGen could make tooling more economically viable to smaller research and toy languages.
  10. 10. P a g e | 4 Another key point in the uniqueness of this project is that it can be used for full scale general purpose languages and not just smaller DSLs. This flexibility shows that generated solutions are at least feasible for full scale languages. The core feature that this project supports is code completion mechanisms. Other generated IDE solutions only provide simplistic code completions for syntax. DLTGen goes much further to provide complicated code completions which interact with knowledge of the language and source code. 1.6 Report Overview This section provides a brief overview of the proceeding chapters in this report. Chapter 2 describes some of the background research which took place in order to understand dynamic languages and IDE generation. It describes common characteristics of dynamic languages and the difficulties posed in supporting them in tooling. It also looks at existing solutions how far they go and their relative merits. Language examples are JavaScript unless otherwise stated. Next, the design chapter conceptualizes the problems DLTGen was required to solve. The solutions which were created are described in a high level using conceptualizations and examples to demonstrate the benefits of the decisions made. This leads into Chapter 3, Implementation, takes a closer look at how the abstract features from design operate together to create an IDE. This includes how they can be composed into a larger IDE description and interact with each other. Chapter 5 provides a walkthrough of creating an IDE for the Ruby programming language. Particular language features are described and then decomposed into a solution represented in specification. Chapter 6 describes the testing methodology for DLTGen and outlines some of the tests performed. The tests discussed in this chapter are looking purely at system robustness. The following chapter, evaluation, looks at how effective the system is. This chapter takes two new languages, Scala and EOL and discusses to what extent DLTGen was able to support them. The focus in this section is what types of language features were able to be supported, what were not and how flexible the solution is. Finally, chapter 8 summarizes the findings of the project and how successful it was against the original objectives. This chapter also discusses possible future work and research that could improve and add to DLTGen.
  11. 11. P a g e | 5 2 Background Chapter 2 discusses the background of the problem to help understand where the technical difficulties for this system lie. This chapter also looks at relevant existing solutions. 2.1 What is a dynamic language? In order to support dynamic languages we need a firm definition of what one is. As previously mentioned this is a poorly defined area. In this section of Chapter 2 are descriptions of some features commonly associated with dynamic programming languages. Each one is described generally followed by a JavaScript example to help conceptualize the technical problem it poses. 2.1.1 Dynamic type systems Dynamic typing is a mechanism whereby type checking is done during execution of the program and not compile time as with traditional static languages. In a dynamic typing system “values have types but variables do not” (7). If I define the variable x as shown in figure 2.1 it has no data type. Only after the second line of code executes does it have a data type String, when the third line executes it becomes a Number. Figure 2.1, Value based typing example The variable x has no type associated with it; its type is determined by its value. This makes type inference very difficult because it requires tracking the values put into x. In a static language we could just look at the definition, in a dynamic type system we cannot. While performing type inference on the variable x the system needs to be more aware of its lifetime, instead of looking for any declaration of x the system needs to find the most relevant use of x for a given line of code. Figure 2.2, Typing Alternatives The lifetime of a variable could be even more complex, in figure 2.2 x has two different data types depending on the evaluation of the if statement. Similarly the value of x could be different outside of the if statement. To provide code completion for this example is possible without interpretation, because the code we want to complete will be in either branch, providing mechanisms to guide the search of variable usage would be enough. if(true) { x = "Hello World"; //X here is a string }else{ x = 42; //X here is a number } var x; x = "Hello World"; x = 42;
  12. 12. P a g e | 6 2.1.2 Duck Typing Duck typing is a more flexible form of dynamic typing. Duck typing is only concerned with what can be done with the data. The name comes from a concept called the “duck test”: “when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.” (8) We are less interested in what the data is; we only care that we can do with the data what we need to do. To continue the duck analogy we don’t care that it is a duck, we only care that it can quack (9). In figure 2.3 there is a function which pokes a given duck producing a quack. The parameter duck has no type, and it could be anything as long as it can quack. Figure 2.3, Duck Typing Example Figure 2.4, Use of duck typing Figure 2.4 demonstrates what would and would not be valid code for the ‘pokeDuck’ function. Neither a blank object nor a string can be used as the parameter ‘duck’ because they don’t have a quack function. In type inference, duck typing is often used to determine members of parameters. The members of an object represent the actual children of the data, these could come from a range of sources. By matching a function call (caller) and a function (callee) an inference system can observe use of the parameter and produce inferences from that value. This mechanism can work both ways. 2.1.3 Runtime object alteration A common feature of a number of dynamic languages is runtime object alteration which relates to dynamic typing systems. It provides a means for a language to change what features are available on an object at runtime. Figure 2.5, Object Alteration Example var bob = new Person(); var paul = new Person(); bob.drinksWine = true; (paul.drinksWine == null) //True //Valid calls pokeDuck(human); pokeDuck(duck); //Invalid calls pokeDuck(new Object()); pokeDuck("A String"); var human = {quack: function(){ print("Human imitates duck"); } } var duck = {quack: function(){ print("Quaackkk!"); } } function pokeDuck(duck){ duck.quack(); }
  13. 13. P a g e | 7 The variable bob and paul are both instances of the “Person” class. In JavaScript it is perfectly valid to add a new member to one and not the other. By the time line 3 of figure 2.5 has executed bob has a “drinksWine” member. This member is not part of the Person class but has been added dynamically. As a result paul does not have an equivalent as the variable was not attached to paul. If we wanted to provide the member for both we can attach it to the classes “prototype”. Figure 2.6, Class Alteration Example Every new Person class instance that is created after the code in figure 2.6 is executed will have a ’drinksWine’ variable. However, any instances created before that line executed will not have this member. Clearly this is a difficult concept to process by just looking at source code, it requires keeping track of when particular objects came into existence, and it is far beyond the capabilities of this project to support this level of complexity. 2.1.4 Well defined global scope Dynamic languages rarely compile to a low level, and because of this it is impossible in many dynamic languages to interact at the system level. This obviously restricts access to much needed system level tasks such as I/O. The solution provided by a number of dynamic languages is well defined global scopes. In this context a scope refers to a place in the program runtime which stores variables/functions/classes and potentially instances of any object defined in the language. Each scope represents a place from which accessors/mutators can find/adjust these instances of language concepts such as types, variables and so on. Core functionality is usually written using native code and linked in a well-defined scope so that the target language can interact with it. Languages which do this include Actionscript (10), PHP (11), Ruby (12) and many more. The problem with this is it relies on external mechanisms because the objects in the global scope are not defined in the target language. Given the large dependency on these well-defined global scopes there needs to be a mechanism of hand crafting a global scope and bringing it into the parse tree/model for analysis. 2.1.5 Code generation A number of dynamic languages have a simple mechanism to generate code on the fly, turning Strings into evaluated code. It is potentially a very powerful feature but it is an unwieldy one. The problem with code generation for a system processing source code is how complicated the process of building the code can be. Figure 2.7 shows two examples of how dynamic code generation might be used. Person.prototype.drinksWine = true; //Rest of the code
  14. 14. P a g e | 8 Figure 2.7, Eval Example The first example simply takes a message and generates new code which will invoke the alert function. Although not overly complicated, it still requires evaluating the code rather than analysis. The second example shows a truly horrific use of eval, generating a new class on the fly. In earlier versions of Ecma script (Javascript) there is no built in syntax for performing polymorphism, many web toolkits provide code similar to that which is shown above to generate classes with polymorphism on the fly. The KONtx framework (13) from Yahoo is an example of a toolkit which does this. With the potential complexity of this feature it is likely to be very difficult or near impossible to support without using a code evaluation technique. However, Erik Meijer poses (14) a set of common uses for executing Strings as programs, and numerous dynamic languages are starting to provide specific support for these features. This reduces the significance of this particular dynamic language feature and the discussed features include:  A substitute for higher order functions.  Deserialization of objects.  Meta programming. 2.1.6 Closures A closure is a first-class1 function which retains the context it was created in. It is related to functional programming but often used in dynamic languages in general. The closure can access local variables from the scope it was created in despite the fact it has its own set of locals and local scope. Figure 2.8 shows an example of closures in JavaScript. A new function is created but it retains access to the ‘wordList’ argument from the creator’s local scope. 1 A function can be created, passed around,stored at run time //Simple example var code = "alert('" + msg + "');"; eval(code); //Complicated example function createClass(id, superType) { var code = "function " + id + "(){" code += "}"; code += id + ".prototype = new " + superType + "()"; eval(code); }
  15. 15. P a g e | 9 Figure 2.8, Closure Example Constructs such as these also impose a need for some way to guide name binding like duck typing. 2.2 Existing Work There are a number of projects (many of which are still experimental) which aim to make IDE creation and supporting dynamic languages easier. The existing solutions fail to meet this project’s requirement by either having high entry levels, not being generative or only providing basic IDE features. 2.2.1 Dynamic Language Tool Kit (DLTK) The dynamic language tool kit (15) is an Eclipse project aimed at reducing the complexity of creating a fully featured IDE for a dynamic language. The main component is a mechanism to process dynamic code using an interpreter and providing frameworks to access some of this information. Interpretation rather than analysis is likely to yield more data for dynamic languages. However, there is no free lunch with the DLTK like many of the IDE frameworks and libraries in Eclipse. The documentation (16) guides the developer through creating over 40 classes to interact with the DLTK and the resulting IDE is still fairly simplistic. This project does seem to have a lot of promise however for more experienced developers willing to invest the time. 2.2.2 Xtext Xtext (17) provides a framework for developing DSLs and programming languages. From a specification in the Xtext language it generates a model (implemented in EMF). Along with this it generates a source code editor. The code editor provides syntax highlighting, hyperlinking and basic content assist/code completion. Code completion provided is based on syntax suggestions rather than language features, for example a suggestion to close a bracket rather than a suggestion of variable members. There are extensions which generate other IDE features such as an outline view, project wizards, etc. This makes Xtext more desirable to this project as it already has an extension mechanism in place. The main problem with Xtext is the poor developer experience. There are two potential issues for developers and this project using Xtext. Firstly it is poorly documented, there is a user guide (18) however if you want to add advanced features or extend Xtext there is very little to help you. Much of the code, although shipped with source often just has generated comments. The second issue is how often it changes; it undergoes heavy changes between versions which can break previous work. This is understandable for an incubation project. function spell_checker(wordList){ var temp = function(input){ return checkSpelling(input, wordList); } return temp; }
  16. 16. P a g e | 10 A technology related to Xtext is Xpand2. Xpand is a code generation framework, used by Xtext. It allows code generation against an EMF model. Code is defined in template files and directives in the source will bind data from the EMF model. Xtend also supports calling Java code during generation; this provides a great deal of flexibility. The Xpand language’s key feature is the ability to “expand” a template for a certain EMF object or collection of EMF objects. This works well with this project as it creates several reusable directives in the specification. The code generated for each directive can be put into one template and be expanded from other templates. 2.2.3 EMFText EMFText (19) is an interesting system which allows the specification of a textual language for an EMF model. It’s a great way to create a DSL with easy parsing as an EMF model. Like Xtext the DSL is specified and an IDE is generated. Although it is fit for its purposes the IDE is fairly simplistic for the aims of this project. There is also less opportunity to extend EMFText than there is to interoperate with Xtext. 2.2.4 Visual Studio Visual Studio has some of the best code completion. Their extension system is called “Language Services”, it uses a ‘lex’ lexer and ‘yacc’ (20) grammar. Visual Studio 2010 provides good support for internal languages built on top of the DLR2 (21) (e.g. C# 4.0). However, this is still in beta and it is unclear to what extent Language Services will make these features available for dynamic language integration. It is also unclear whether or not the language must run using the DLR. 2 Dynamic Language Runtime
  17. 17. P a g e | 11 3 Design The purpose of this chapter is to address what the system needs to be capable of doing and which mechanisms have been created to accommodate this. There are several pieces to this project; frameworks, generators and utilities, however this chapter will only outline solutions abstractly - Chapter 4 will discuss some of the more interesting technical specifics in more detail. Below is an overall conceptualisation of the processes which will be discussed in this chapter. Figure 3.1, Overall Design 3.1 DesignAlternatives Before addressing the process and results of design, the features DLTGen will support need to be made clear. As shown in Chapter 2 there are a number of existing solutions related to generating IDEs. The one that was most rich in features and appropriate to this project was Xtext and as a result DLTGen is implemented as extensions to Xtext. Xtext provides a number of IDE features including syntax highlighting, properties, wizards, hyperlinking, outlining and more; however it does not provide sophisticated code completion (also known as Autocomplete). The decision was made that this was the best place for DLTGen to add value. Code completion is a very rich IDE feature; it is also one of the most complicated IDE features to create. Code completion is a complicated problem for any language to handle, however dynamic languages add extra complexity by being harder to analyze. Therefore the majority of DLTGen’s features will be aimed at supporting sophisticated code completion. A fundamental decision taken by DLTGen is the way it analyses source code. DLTGen provides a simple mechanism whereby additional functionality can be attached to elements in the Xtext grammar. There are a number of other approaches for analysing source code, one such method is interpretation. Executing select statements or complete code to infer inferences is clearly a very powerful idea with the potential to yield a lot of information. For this projects purposes however the mechanisms of interpretation are too complex and language specific. Another method would have been to create workflow analysis mechanisms
  18. 18. P a g e | 12 where language interactions are defined. Defining a complete workflow analysis however does not lend itself well to a simple definition. 3.2 Goals Code completion can manifest in several forms. Generally it consists of showing a popup list of possible completions for the current context and allowing the user to select the correct one. It could also be presented as automatically inserting code when there is less of a choice for the end user. Code completion as a process can be divided into the following stages: 1. In the background continually build up meta data for language objects and provide inferences. 2. Detecting there was a request/opportunity to provide code completion. 3. Determine sensible completions for the current context. 4. Surface the completion to the user. The rest of this chapter will describe the solutions provided by DLTGen to handle these four requirements using a higher level specification. 3.3 Model In the background of any sophisticated IDE is always a model of the code being worked with. In DLTGen the model represents an index of meta data derived from looking at source code and other sources so that it can be used later in other IDE features. Building the model is a multistage process but it usually starts with detecting language features in the source code. Xtext already provides an opportunity to express language features in a grammar and DLTGen builds on this. Non-terminals in this grammar can be processed to become part of the model and the understanding of the source code. There are three types of data that need to be captured in the model to provide sophisticated code completion; these include static data, runtime data and type data. Any language feature (non-terminal) can makeup these different data elements. Static Data represents well defined Meta data that is loaded only once. This allows well known language features to be pre-defined in a hand written specification and included in the model. This makes providing a well-defined global scope possible, a key feature in many dynamic languages. The developer would declare their global scope statically in an XML file or in the target language which would then be referenced in the IDE features specification as a static scope. The next type of data is Runtime data; this contains data for constructs found in the current source code. The way this data is collected is defined in the specification language. Runtime Data will be thrown away periodically as the document changes and features get removed. Meta data in the runtime model has a time to live, when this time is up the data will be completely reprocessed. This improves efficiency as we do not have to process every language feature every time we need to use the model. Finally there is Type data; this is information about data types in the type system. Not all languages have type system; however it is common enough to provide support at this level in
  19. 19. P a g e | 13 DLTGen. This data can be affected by and contain Meta data from static or runtime sources. It maintains data such as members of a data type, polymorphic information etc. Data modified in the Type model by the static model is non-volatile, data modified by the runtime model is volatile Creating the model is the job of “model builders”; a model builder is a group of language processing mechanisms defined by the user to contribute to the model for a particular non- terminal. It describes how each language feature contributes to the model and how certain aspects should be processed. This is described further in Chapter 4. Appendix A provides a visualization of the model sources and what data they might contain. The reason for collecting this data is so that other IDE features can use it. As previously stated the life cycle of an IDE feature is to detect the request, find the proposals and display them. No proposals can be found if there is no model. Similarly it is unreasonable to do all this processing within the life cycle of an IDE feature as it would be too inefficient. This chapter continues by describing the life cycle of an IDE feature, which will always assume to have a model behind them. 3.4 Detecting there was a request In order to detect a request typically “activation characters” are registered with the IDE framework, when one of these characters is detected, the plugin is given a chance to determine if there is an opportunity to perform code completion. This is unnecessary boiler plate code for a generated solution; instead DLTGen provides “triggers”. A trigger is invoked when a certain prefix is found in the code, for example if in Javascript the user writes the key phrase “new” it would be sensible to propose a list of instantiable classes. There are two types of trigger, static and model. A static trigger does something basic such as closing a bracket, for which we do not need to look inside the model. A model trigger requires access to the model to explore language features before it can complete its task. The reason for the distinction is to provide better efficiency, accessing the parse tree and model require them to be locked to other code, it is senseless doing this when the model is not required for a simplistic task. A common use for a static trigger is to automatically close balanced characters such as brackets, braces and string literals, in the following example when an open bracket is detected a closing bracket is automatically inserted. Figure 3.2, Static Trigger Example The new character is inserted at the current cursor position however where the cursor ends up can be controlled. By providing an offset of -1 the cursor is moved back into the brackets, <staticTrigger sequence="("> <insert> <sequence>")"</sequence> <offsetSource>insertEnd</offsetSource> <offset>-1</offset> </insert> </staticTrigger>
  20. 20. P a g e | 14 this is useful in Javascript where arguments can be put in between brackets. Static triggers rarely get more complicated than this. Model triggers are where the core functionality of DLTGen starts. The following example demonstrates displaying a list of classes when the new keyword is entered. In JavaScript, any function is theoretically instantiable. Figure 3.3, Model Trigger Example Figure 3.4 Possible result of the trigger When the “new “ key phrase is detected the “actions” in the trigger will begin to function. There are a number of actions which help provide the desired functionality and they will be described in more detail further in this chapter. It would be possible in future work to provide additional trigger invocations such as key combinations or menus however for this project’s purpose key phrase invocation is enough to demonstrate a wide range of language feature support. 3.5 Determine sensible completions The most difficult aspect of providing code completion is deciding what completions are sensible in the current context and most of this project’s efforts have been put into this problem. The problem can be decomposed into two key tasks shown in Figure 3.5. The first task is transforming features from source code into data which is usable. When a language feature is detected it is automatically added to the model, however a simple Xtext grammar parse tree is not enough information to perform code completion. DLTGen provides an opportunity to put additional information into the model against a non-terminal to help subsequently explore the model. The second task in determining sensible completions is interpreting the context and the model. For the example of the new keyword, the interpretation would be finding all instantiable classes. <modelTrigger sequence="new "> <find var="funcs" in="model" quantity="*"> <condition>funcs.class == JS_Function</condition> </find> <propose>funcs</propose> </modelTrigger>
  21. 21. P a g e | 15 Figure 3.5, Tasks and Actions To achieve each of these two tasks there are “actions”. An action is a well-defined procedure which can be combined together in an imperative way to produce functionality. Figure 3.5 shows some of the actions that are required to achieve the two tasks; however, the boundary can and often does merge. This section of the document will demonstrate how some of the key actions can be used to help determine sensible completions for different language features as well as showing what they provide to the system as a whole. 3.5.1 Annotate Action Annotations are a method of attaching Meta data to a non-terminal object. The Meta data is simply a key value pair and can be anything. What it fundamentally enables is the ability to do work ahead of time. Processing can be done using other actions to determine some characteristic of the language feature which will subsequently be useful. When storing Meta data on a non-terminal all non-terminal attributes are accessible, this is an extension to the Xtext mechanism not a replacement. Below is an example of an Xtext non terminal. Figure 3.6, Javascript Return Statment Non-Terminal The only attribute of a Javascript return statement is “expression”, new attributes cannot be introduced into the grammar unless they represent something parseable in the source text - it is a grammar not a model. In DLTGen any attribute from the Xtext grammar can be referenced but new values can also be defined. This also allows DLTGen to expose special Meta data variables such as ‘_path’ which represents the qualified path of a non-terminal. This is used to uniquely identify an object to help track the life cycle of a feature and determine which data can be cached or thrown away. Below is a scenario where there is data that can be worked out ahead of time to benefit support of the dynamic language feature object alteration. Imagine the following Xtext non-terminal: JS_DynamicAssignment: (objectName=JS_TERMINAL_IDENTIFIER)'.' (memberName=JS_TERMINAL_IDENTIFIER)'=' (expression=JS_ValueStatement) (';'?); JS_Return: 'return' (expression=JS_ValueStatement) (';'?);
  22. 22. P a g e | 16 Figure 3.7, JavaScript Assignment Non-Terminal It is intended to match runtime object alteration, that is to say there is a variable (objectName) and a new member is being added to it (memberName). Figure 3.8, Object Alteration Example In the above example, ‘o’ is the objectName and ‘name’ is the memberName. When it comes to determining the members of ‘o’ it would be useful if the model already knew that the JS_DynamicAssignment ‘name’ is a member of the JS_Var ‘o’. The following annotation for a JS_DynamicAssignment could detect and store that meta data. Figure 3.9, Object Owner Annotation When it comes to determining the members of ‘o’ the model can be searched for all objects with “ownerObject” set to the variable being processed. The keyword “self” in DLTGen refers to the current non terminal. Any attribute on the non-terminal is accessible as well as some special attributes such as ‘_name’. The ‘_name’ represents the qualified name, assigning this for an object makes searching by name possible. Whenever an action has a “var” attribute, the result of that action can be accessed by other actions using the name given, this can be seen on the last line of figure 3.9. Annotations can also append to collections, when trying to determine a list of proposals the following syntax is used: Figure 3.10, List Append Example It takes the form of the collection += new member or members, the variable “aListToAdd” could be, for example, the results from searching the model. These mechanisms of analysing the semantics of the model to transport information could be compared to a less specific attribute grammar (22). The mechanisms provided by DLTGen care less about the exact semantics of the model or parse tree. Constructs simply relate to something they know is somewhere in the model without the complexity of an attribute grammar. <annotate>result.members += aListToAdd</annotate> <annotations> <!--Find the variable we are attaching to--> <find var="obj" in="scope" quantity="1"> <condition>obj._name == self.objectName</condition> </find> <!--Store it as our owner object--> <annotate>self.ownerObject = obj</annotate> </annotations> var o = new Object(); o.name = "Bob"; o.
  23. 23. P a g e | 17 3.5.2 Find Action Once a Model is being maintained the data inside it needs to be accessible for that data to be useful. To deal with this task there is the Find action, it provides a mechanism to search the model among other things providing filters to constrain results. The Find action does however go much further by providing a wide range of “search sources” that help constrain where the system can expect to find the results, most importantly providing a consistent syntax but also improving efficiency. Providing several sources also keeps the syntax minimal, there is no need to specify additional conditions repeatedly to capture the common functionality which the search sources represent. Figure 3.11 shows just a few of the more useful search sources. Figure 3.11, Selection of Search Sources Being able to query these data sources enables a wide range of language features to be supported because it is so flexible. Just a few of the things it can enable include name binding3, member resolution4 and much more. The specification for a find action always requires 2 parameters. The first is called ‘var’, this represents a name which makes the results accessible. This variable can be referred to by other actions subsequently; it could even be the source of another find. The second attribute is named ‘in’ which represents which search source to use, this can be a well-known name such as “model” or a previously named variable (e.g. result of a previous find action). Any find which returns non-terminals also requires a quantity attribute, this can be ‘1’ or ‘*’ to represent one or many. Figure 3.12 shows the basic syntax of a find action, it attempts to find the class which owns the current feature. Figure 3.12, Find Parent Class Example Conditions allow the results to be constrained, any of the attributes on the non-terminal as well as Meta data and special values can be tested. The special attribute “class” is what type of non-terminal it is, this is being used to pick out only JS_Class non-terminals. The result is referable as “jsClass”, for conditions this is how they reference the current non-terminal being filtered, subsequently it represents the result value. This found value can be used like any non-terminal; all its properties are accessible including Meta data and special values. 3 Ability to match an identifier to the language feature/definition it refers to. 4 Ability to find features owned/contained by anotherlanguage feature. <find var="jsClass" in="parents" quantity="1"> <condition>jsClass.class == JS_Class</condition> </find>
  24. 24. P a g e | 18 When searching inside a non-terminal it is also possible to direct the search. This is useful to provide custom scoping rules, in Javascript when looking for the value of an identifier there is a well-defined order of which scopes to search first: 1. The current function’s local scope, this includes variables and arguments inside the current function. 2. The parent classes local scope. 3. The global scope. This logic can be captured as search rules for a “JS_Function” and “JS_Class” in the specification. This feature is described and demonstrated more in Chapter 4. 3.5.3 Link Action Linking is a mechanism to define the relationship between non-terminals so one can be turned into another. For example there is a well-defined relationship between a generic parameter5 and a generic value6, when displaying a method which returns a generic parameter what really should be displayed is the generic value. The problem is however that the generic parameter and the generic value belong to two completely separate language features. A fundamental difference between analyzing dynamic code and analyzing static code is a greater need to observe use of language features rather than their definition. To determine a variable’s data type in Javascript, the system must look at its use (an assignment) rather than its definition. This is a recurring theme in a number of dynamic language features. To support more advanced uses of this DLTGen has a feature called the “inference stack”. While determining completions the system (as guided by the specification) will visit a number of non-terminals. Each non-terminal visited is put onto the inference stack to keep a history of all features involved in calculating the current completions. The inference stack is exposed in two forms, firstly as a search source and secondly through Linking. To demonstrate this below is an example of how this could work for the previous example of generic parameters as supported in newer implementations of JavaScript (ECMAScript fifth edition (23) & ECMAScript Harmony). Figure 3.13 shows the syntax of generics in this version of Javascript. Figure 3.13, Javascript Generics The variable ‘list’ is assigned to a Vector instance; however, the vector has a generic parameter with the value ‘String’. This indicates that it is a Vector (list) of Strings. The second line of code in figure 3.13 shows calling the item function, for the Vector class this should return a value with the same type as the generic parameter. Presume that the following 5 Generics is a mechanism of allowing types to be determined later. 6 The value provided for the to be determined type. var list = new Vector.<String>(); list.item(0).
  25. 25. P a g e | 19 three non-terminals exist with the given attributes (grammar definitions for these can be found in Appendix B).  JS_Instanciator (name:STRING, genericParam:JS_GenericDef)  JS_Class (name:STRING, genericParam:JS_GenericValue)  JS_Function (name:STRING, returnType:JS_ReturnType) When proposing the functions for a Vector instance there is a JS_ReturnType which could be a generics parameter. This needs to be converted into a JS_GenericValue in order to display it. To resolve this value linking can be used along with the inference stack, figure 3.14 shows the contents of the inference stack for the given Figure 3.14, Inference Stack To invoke the linking mechanism the “Link” action is used, it is specified as shown in figure 3.15. It states that aim of converting “self.returnType”, as in the JS_ReturnType into a JS_GenericValue. Figure 3.15, Link Action Example In order for this link to work the specification needs to define the links between the various non-terminals involved in this transformation, these are shown in figure 3.16. Figure 3.16, Transformation Route <link var="retType" type="JS_GenericValue"> <from>self.returnType</from> </link>
  26. 26. P a g e | 20 The rules are as follows, a JS_ReturnType can be converted into a JS_GenericDef (found on a JS_Class) if they have the same name. That JS_GenericDef can then be converted into a JS_GenericValue (found on a JS_Instanciator) if they have the same index7. 3.5.4 Algorithm Action Dynamic languages have a wide range of features; DLTGen attempts to create generic constructs which can be used to support these features. Sometimes these generic features are not enough and adding additional actions would clutter the specification too much. For these circumstances there are special algorithms. Common algorithms which would be too complex to define in the specification are implemented using the algorithm action. Some of the algorithms available will be described in Chapter 4 however below is an example of the syntax. It is basically an attribute set along with what data to use as an input and where to put an output. Figure 3.17, Duck Typing Algorithm Figure 3.17 is an example of achieving duck typing, a common way to resolve values for arguments which are otherwise unknown. 3.6 Surface the completionto the IDE In order to take the completions found and present those to the user there needs to be mechanisms to interact with the IDE. DLTGen provides three mechanisms to handle this described below. 1. Insert Action – Used to insert text into the code editor. 2. Propose Action – Used to display a pop up list of completions. 3. Style Action – Used to style a completion. These mechanisms are explored in brief detail in the remainder of section 3.5 as they are fairly simplistic. 3.6.1 Insert Action The insert feature allows the IDE to insert automatic textual completions into the current code editor. This could be used to insert static strings or the results processed using other actions. In section 3.3 there is a simple example, for more information see the appendices for examples or the user manual for full instructions on using it. 3.6.2 Propose Action & Styling Once a trigger has found a list of proposals it needs to tell Eclipse to display them and how they should be displayed. 7 The index value represents their index in their containing feature, for example a list of arguments. <algorithm var="duck" in="scope" input="self" type="DuckTyping"> <attribute id="caller" value="JS_Call" /> <attribute id="callerArg" value="JS_ArgumentValueListTail" /> <attribute id="callee" value="JS_Function" /> <attribute id="calleeArg" value="JS_ArgumentNameListTail" /> </algorithm>
  27. 27. P a g e | 21 The style action is used to describe how a proposal for a particular non-terminal should look when it is displayed in Eclipse. Whenever a non-terminal is “proposed” by a content assist trigger this mechanism is invoked to determine how it should look. Determining the style is a simple task, it involves providing values for attributes like which label to use, what icon to use etc. It can also specify a different value to insert when selected and control where to move the cursor after insertion. The following example would be appropriate for JS_Function non-terminal. The label should be the name of the function followed by the arguments list. Figure 3.18, Proposal Style Example The first action is using a special algorithm to join the arguments into a comma separated list, which is then used in the label of the proposal. The value attribute represents what will be inserted when selecting this proposal, this is the name of the function followed by a pair of parenthesis, however the -1 value used for “offset” will move the cursor back into the bracket pair so the user can enter their arguments. <proposal> <!--Join the arguments into a list--> <algorithm var="args" input="self.args" type="StringJoin"> <attribute id="seperator" value=", " /> <attribute id="value">input.name</attribute> </algorithm> <!--The actual styling of the proposal--> <style> <label>self.name "(" args ")"</label> <value>self.name "()"</value> <offsetSource>insertEnd</offsetSource> <offset>-1</offset> <icon>ASSIGN_ICO</icon> </style> </proposal>
  28. 28. P a g e | 22 4 Implementation This chapter discusses, in more technical depth, how the various components of the system come together to generate an IDE from a specification. Things that will be looked at include how the specification language can be composed to support language features. It will also look at some of the more useful algorithms provided by DLTGen. Finally there will be a brief overview of the sub system which generates the IDE code. 4.1 Overview DLTGen is essentially made up of three separate components each having several different tasks. Figure 4.1 shows these three components and what they provide. Not all of these will be discussed, more detail can be found in the working documents and user manual. Figure 4.1, Core Components of DLTGen Specification Language Chapter 3 discussed some of the abstract concepts supported in the system and specification language. For these actions to be useful they need to be grouped together to perform certain tasks. DLTGen provides a number of tasks which are overviewed in this chapter. Framework The majority of DLTGens functionality has been put into a common library called ‘DLTFrameworkLib’. To create clean and concise generated code as much functionality as possible was created in this framework to allow the generated code to simply invoke it. Some of the features in this framework will be described including some of the eclipse integration points and some special algorithms. Generator The generator is an Xpand2 generator; responsible for turning the specification into code. In order to keep the specification clean this project had to create a number of generator
  29. 29. P a g e | 23 extensions. These manage the code as it is generated keeping track of variables, their data types and if they are nullable8. This ensures DLTGen generates efficient and safe code. 4.2 SpecificationLanguage The specification language is split into 6 sub sections where different language and IDE features can be described; these can be seen in Appendix C. Most of these are fairly simple and will not be covered outside of the user manual; the focus of this chapter is the “classes” section which is where model builders are placed. A model builder is a definition of a series of common tasks which can be performed for a particular non-terminal. The syntax of a model builder is shown in Figure 4.2 for a JS_Var non-terminal. Figure 4.2, Model Builder for non-terminal JS_Var By providing a set of common tasks for a given non-terminal the result of these well-known tasks can be exposed to other model builders and triggers. One of these tasks is the “members” task described in section 4.2.3; the purpose of this task is to return a list of members for the given non-terminal. This works to the strength of the grammar, for example a variable might be defined as (name=ID) ‘=’ (value=Statement) where Statement is a non-terminal with several non- terminal alternatives. The alternatives could be ‘StringLiteral’, ‘NumericLiteral’, ‘Instantiator’ for example. The members task of a variable could simply return the members of its “value”, the Statement non-terminal. Depending on which alternative it happens to be a different members task will be invoked. This way of defining common tasks keeps the specification simple and reduces redundancy. 4.2.1 Qualifier Task Qualifying a non-terminal serves two purposes in DLTGen, it allows the user to provide values for an object’s name and an objects full path. The full path is used internally to uniquely identify objects to improve efficiency. Everything a non-terminal is parsed into the model it is qualified. The two special attributes are referable as ‘_name’ and ‘_path’. Figure 4.3, Model Builder Qualifier 8 Characterised by being able to hold the value null, and therefore be unsafe to use without a test. <eclass class="JS_Var"> <qualify> <annotate>self._name = self.name</annotate> </qualify> </eclass> <classes> <eclass class="JS_Var"> <!--Tasks go here--> </eclass> </classes>
  30. 30. P a g e | 24 For any non-terminal which defines this behaviour other actions can use their _name and _path values, this is useful as it provides a well-known location for the name of a particular language feature. This makes name binding, a very common task in code analysis, much simpler. 4.2.2 Annotate Task The annotate task provides an opportunity to add to the model as soon as a new non-terminal is parsed. The annotate task is always invoked straight away for every non-terminal. This is where processing that can be done ahead of time such as finding related non-terminals should be done. 4.2.3 Member Access A member is a child of particular language feature, for example an instance of a String in JavaScript has a number of static and instance functions and properties as members. The key value added by the members task is it provides a simple way for each non-terminal to determine their own children. Other specification features can then access this members list that was returned and filter/sort them as required. In a statically defined language this is not required, typically all the system needs to know for a static language is what data type something is and it can find the members itself. In a dynamic language constructs such as runtime object alteration mean this can be less predictable. Below is an example of how JavaScript’s particular flavour of runtime object alteration can be supported using the members task. Figure 4.4 shows some simple JavaScript statements, a variable is created of type Object and then two new members are added to it, its children are no longer just the members of the class Object. Figure 4.4, Runtime Object Alteration To determine the members of the variable ‘details’ the IDE must look at its type and its runtime attached members, figure 4.5 shows this. var details = new Object(); details.username = "test"; details.password = "test";
  31. 31. P a g e | 25 Figure 4.5, Member Resolution Visualization First it uses the members of its expression that is what it was assigned to; this could be anything including an identifier pointing at another variable. In the above example it was assigned to an Instanciator so its members then find the Type and use its members. The other source is runtime alterations for the variable. To retrieve these, a simple find action can be used. Specification for this support can be found in Appendix D. 4.2.4 Scope Searching Task In many languages, dynamic and static there is often a structure to how a scope is searched, the scoping task allows the user to control this. In DLTGen any non-terminal can be a scope, without search rules looking inside the current scope will only ever find objects found below the object in the parse tree. However, the scoping task allows the specification of which other scopes can be searched and in what order. Using the JavaScript scope search example from chapter 3 there are three stages to resolving an identifier in a function. As shown in figure 4.6, first it must look inside the functions own scope. If it does not find the member there it looks in the class which owns the function’s scope (ECMAScript 3rd Edition only). If it still cannot find the variable it will look in the global scope. Figure 4.6, Scope Searching Hierarchy Figure 4.7 shows how the first stage of this process could be created in specification. The JS_Function should search its local members first (_contents) and then it should search its parent class. A similar set of rules could be declared for the JS_Class non-terminal. This task has a special action only usable in the scope task. “<find-source>” this determines what the next scope to search is, there can be several of these in one scope definition and it will search the scopes in the order defined. Like a sub-routine call, any other scope used as a “find- source” will be fully explored before returning to the next given source.
  32. 32. P a g e | 26 Figure 4.7, Example Scope Search Rule 4.3 Framework The framework in DLTGen is where all core functionality is implemented, it contains code intended to be invoked by generated code, helping to keep the generated code concise. This chapter looks briefly at a few of the more fundamental components in the framework which improve the overall understanding of how DLTGen works. The particular areas that will be discussed are fundamental algorithms and eclipse integration points. 4.3.1 Special Algorithms There is some functionality which is too complicated to express using the generic constructs in DLTGen. To avoid cluttering the specification specific, algorithms can be exposed as “special algorithms”. The most common algorithm used during this project was dot typing; all four languages looked at have support for this. Dot typing is a mechanism of accessing members of objects and subsequently the members of other members by building a path separated with dots. Figure 4.8, Dot Typing Example The above example is performing dot typing on the variable “str”. This variable is a String; strings have a member called “toUpperCase” which returns another String. String also has a member called “toString” which returns a String. So, when dotting off the final member in Figure 4.8 it should propose String type members. The dot typing algorithm deals with this as follows. In the following specification presume “path” is the dot path string that has been previously resolved, in figure 4.8 path would be the complete second line of code. Figure 1.9, Dot Typing Invocation The algorithm splits the input by the separator and then parses each piece on the fly. Using tasks described in section 4.2 it can then determine the members of the first section. It then finds a member in that list with the same qualified name as the next section, this is then repeated. Figure 4.10 demonstrates this with the aid of a sequence diagram. <algorithm var="dotItem" in="scope" input="path" type="DotTyping"> <attribute id="seperator" value="." /> </algorithm> var str = "Hello world"; str.toUpperCase().toString(). <scope> <!--Local vars--> <find-source>self._contents</find-source> <!--Parent classes--> <find var="jsClass" in="parents" quantity="1"> <condition>jsClass.class == JS_Class</condition> </find> <find-source>jsClass</find-source> </scope>
  33. 33. P a g e | 27 Figure 4.10, Dot Typing Algorithm 4.3.2 Eclipse Integration Points A key aim of DLTGen was to make creating an IDE possible without being familiar with the underlying IDE. This has been realized with an IDE framework neutral specification language. However it is worth briefly overviewing exactly what has been abstracted away and what specification features map to which Eclipse specific feature. 4.3.2.1 Builder In Eclipse, all background processing is performed in construct called a “builder”. A builder is typically used to incrementally update models related to the source text as it changes. DLTGen uses builders to perform the same function. A number of features including type system building, non-terminal annotation and qualification are performed in the builder as the code changes. Non-terminals are processed and given a time to live; this is to ensure any invalid data (perhaps from parsing broken code) does not stick around for too long. When runtime features need to use the model it is requested from the builder and locked so no more updates can occur during processing. This removes a number of difficult problems developers face when creating a builder. The specifier does not need to concern themselves with efficiency, model exclusion or mechanisms to determine what code changed to improve performance.
  34. 34. P a g e | 28 4.3.2.2 Content Assist Processor Eclipse provides a lot more freedom for code completion invocation via a mechanism called the content assist processor. Eclipse simply provides a mechanism to be called back when certain “activation characters” are entered into the source text. It is then the developer’s responsibility to not only create proposals but also filter them as the user continues to type; this is unnecessary boiler plate code. This construct has been replaced by triggers in DLTGen, they are simpler as they have activation strings not only characters. In addition to this all sorting and subsequent filtering is automatically dealt with. 4.4 Generator An amount of what the generator does is expanding a template for a given language feature in the specification, however most of the time more advanced processing needs to be undertaken to make the code work efficiently and safely. The following two sections discuss these issues and the technical problems they pose. 4.4.1 Path Resolution DLTGen’s specification is not as simple as defining simple values for simple attributes, many of the constructs are model interactions dependent on the particular language and grammar. For example, take the annotation “x._name = y.name”. The system needs to be certain what the most primitive types of x and y are before it can generate code to access its variables. If the system knows what specific non-terminal ‘y’ is then it can generate “y.getName()” otherwise it needs to go via a general getter which takes the form: y.getAnnotation(“name”). Obviously it is in the interest of efficiency to directly call the getter but it is sometimes not possible. If y was the result of a find action all the system can be certain of is that it is a non- terminal (or String for document search). DLTGen paths can also be much more complicated and go many levels deep. To support this DLTGen provides a mechanism to keep track of variables as code is generated, from this contextual information reflection is used to determine what can be directly accessed and what cannot. If there is no feasible way to get an attribute an error is produced to guard against specification mistakes. 4.4.2 Code Safety DLTGen is a higher level specification and therefore removes the concern of code safety. The code DLTGen generates is determined to be safe or not and if not appropriate statements are inserted to make it safe. For example, presume “self” represents a JS_Var non-terminal. It is legal to access “self.expression._name” where expression is another non-terminal JS_Expression. However, it is possible that expression could be null making this statement unsafe. This is solved in DLTGen by keeping track of what is ‘nullable’, from this information safety can be determined and unsafe expressions can be wrapped in safety checks.
  35. 35. P a g e | 29 5 System in operation This chapter demonstrates using the system to create an IDE for the language Ruby. It will explain how to get started as well as go through supporting some Ruby language features. Not everything will be covered as Ruby is a complicated programming language with too many features to cover. A full specification and grammar can be found in appendices E and F respectively. 5.1 Ruby Background Ruby is a dynamic general purpose programming language. It has gained a lot of popularity recently as a web technology “Ruby on Rails”. Ruby supports a number of programming paradigms including functional, object orientated and imperative. Ruby was chosen as one of the two original target languages because it is popular and introduced new concepts Javascript did not. To follow is a brief overview of some of the interesting features of Ruby which make it desirable to this project as a good example of a dynamic language. For more information about Ruby see the official website (24).  Everything is an expression.  Everything is imperatively executed including declarations.  Unique block syntax and external iterators (discussed in more detail later).  Everything is an object. 5.2 Getting Started The following section provides a brief overview of how to get started using DLTGen. 5.2.1 Prerequisites In order to build an IDE using DLTGen, Xtext must be installed. Xtext provides the grammar engine and the starting point for IDE integration. The Xtext version used for this project is 0.7.2 and can be downloaded from the Xtext distribution site (25). It is essential to use the correct version as Xtext has a habit of changing features dramatically between versions. 5.2.2 Creating an Xtext project The Xtext project is the starting point for DLTGen. For more information on setting up a project consult the user manual for this project. When a project is created a grammar file and a generator workflow will be created. The first stage in developing an IDE is to develop this grammar. This document will not explain ways of going about this, information can be found on the Xtext documentation website (26). Grammars for all the languages in this project can be found on the project website. 5.2.3 Creating a DLTGen project All that is required to convert the Xtext project into a DLTGen project is to right click on it and go to the DLTGen -> Add DLTGen support menu option. This will include the necessary
  36. 36. P a g e | 30 bundles, create a specification xml file and add the DLTGen fragment to the generator workflow. The created specification xml file is where the language specification referred to in the following section should be placed. 5.3 Specifying the language and IDE There are several tasks involved when specifying an IDE. For the most part, a specification does not need to be complete; it is pointless to qualify a string literal for example. In order to specify just what is required the best approach is to pick a feature to support and add the required pieces in a methodical way. 5.3.1 Global Scope A good starting point is usually defining any well-defined global scopes. The subsequent features added will most likely benefit from having some global data they can display. For example showing a list of classes after an instantiator would be dull if there were no classes defined. There are two ways to define a global scope, it can be hand written in XML for complete control or provided in the target language that the system can parse. Figure 5.1 demonstrates this; it defines the String type and the signatures for its members. This Ruby code file is simply referenced in specification under the static model as shown in Figure 5.2. Figure 5.1, Ruby String Type Figure 5.2, Global Scope Loading 5.3.2 Type System & Literals Ruby is a full general purpose language with a basic class system familiar to any Java programmer. In order to reference types in our model ‘type specifiers’ need to be defined, these define what non-terminals represent class definitions and how to retrieve their members. It can also provide other information such as polymorphism descriptors. <staticModel> <scope id="global" src="global.rb" type="language" /> </staticModel> class String def capitalize() end def casecmp(other_str) end def chomp() end end
  37. 37. P a g e | 31 Figure 5.3, Type Specifier Example In the above example, any ClassDef non-terminal represents a class; this non-terminal contains an attribute called “statements” which contains the children of the class. This will be used as the members of a ClassDef. Whenever the language parses a ClassDef non-terminal it will register a type using the qualified name and add its members to the type’s member list. For the example in figure 5.1 the type “String” would be registered with 3 MethodDef’s as children. In many programming languages there are “literals”, inline values which represent an instance of a class with a particular value. For example, in Ruby “Hello World” is a String literal; it represents an instance of the String class with the value “Hello World”. Once there are rules in the grammar to match a literal it is usually necessary to define the members of a literal as the members of the type they are instances of. There is a short hand for defining this in DLTGen shown in Figure 5.4. Figure 5.4, Literal Definition 5.3.3 Global Variable Accessor The first complete feature this walkthrough will implement is suggesting variable names. In Ruby there are several scopes where variables can be stored; one of these is the “global” scope. This is not to be confused with DLTGen’s concept of a global scope. There is only one global scope, if a variable is put into it anywhere in code it can be taken out anywhere else. The syntax is very simple, the variables name is prefixed with a dollar symbol to indicate it is global. Figure 5.5, Ruby Global Variable Assignment A reasonable IDE feature for Ruby would be when the user enters a dollar symbol the IDE proposes a list of all global variables it knows about. The following three stages are required to make this feature work. 1. Detect the dollar symbol has been entered. 2. Find all the global variables. 3. Display the variables found in a proposal window. $globalVar = "Hello World" <literals> <literal class="LiteralString" type="String" /> <literal class="LiteralNumber" type="Number" /> </literals> <typeSystem> <typeSpecifier class="ClassDef"> <members> <annotate>result.members += self.statments</annotate> </members> </typeSpecifier> </typeSystem>
  38. 38. P a g e | 32 Firstly, to detect the dollar symbol a trigger needs to be defined. It must be a model trigger because it is going to search the model for global variables. The next task is to find all the global variables and propose them. This is achieved simply using a find action followed by a propose action. Figure 5.6, Trigger to find global variables The find action is searching in “model”, this means it will look everywhere. The quantity is “*” (many) because the goal is to find all the global variables. Finally there is a condition which ensures it only finds GlobalVariable non-terminals. Once the variables are found it simply proposes what was found, DLTGen automatically removes duplicates. This represents the processing portion of the completion. Before this feature will work there needs to be specification to guide the presentation of the proposal. Figure 5.7 shows a possible styling where the label is simply the name of the variable; a future expansion to this could include showing the variables data type. Figure 5.8 shows what this feature looks like when passed through the generator. Figure 5.7, Non-Terminal Proposal Style Figure 5.8, Global Variable Proposals <!--Global Var definition--> <eclass class="GlobalVariable"> <proposal> <!--Styling of the proposal--> <style> <label>self.name</label> <value>self.name</value> </style> </proposal> </eclass> <modelTrigger sequence="$" id="GlobalVars"> <find var="globVars" in="model" quantity="*"> <condition>globVars.class == GlobalVariable</condition> </find> <propose>globVars</propose> </modelTrigger>
  39. 39. P a g e | 33 5.3.4 Simple Type Inference In Ruby, the value of a variable can be determined by looking at how it is assigned. For example, if a variable is assigned “Hello World” it obviously is a String. Ruby is a C style language, it supports dot typing, members of a previously defined construct are accessed by providing the name followed by a dot. Figure 5.9, Ruby Dot Typing Example The above example defines a local variable with the name “str”. It is given a value of a String literal. When the second line is processed, the appropriate suggestion would be members of the String type. Figure 5.10 shows the stages in making the inference. Figure 5.10, Process of Type Inference The first task is to create the model trigger which will begin this process. This trigger must find the dot path, the reference as typed in by the user to perform inference upon. Then it uses the dot typing algorithm to find what the path refers to. Finally it needs propose what was found. Figure 5.11, Ruby Dot Typing Trigger This trigger basically resolves the dot path and passes it into the dot typing mechanism; the result of this then determines the members shown. <modelTrigger sequence="." id="DotAccess"> <find var="path" in="document" direction="reverse"> <break>EOF</break> <!—In reverse mode is start of file--> <break>Block</break> <break>BraceBlock</break> <break>MethodDef</break> <break>ClassDef</break> <break>';'</break> <break>'|'</break> </find> <!--Dot typing mechanism--> <algorithm var="dotItem" input="path" type="DotTyping"> <attribute id="seperator" value="." /> </algorithm> <!--Show the found items members--> <propose>dotItem.members</propose> </modelTrigger> str = "Hello World"; str.
  40. 40. P a g e | 34 In the Ruby grammar a non-terminal called “LocalVariableRef” is defined, it represents a reference to a local variable with no modifier. To make the dot typing algorithm work for a LocalVariableRef there needs to be a member’s task for it. All it needs to do is find an object with the same name in the same scope and use its members. To be able to do this of course there needs to be a qualification task for any non-terminals involved so that their name value is available to match against. Figure 5.12, Example Identifier Resolver The next stage, as shown in figure 5.10, is to use the value the variable was assigned to as its type. The members request can simply be forwarded to the variables value non-terminal, this is shown in figure 5.12 under the InstanceVariable’s members task. A similar handing off is defined for the non-terminal “AssignmentTail”, as this is what InstanceVariable is followed by in the grammar. The AssignmentTail class simply contains an operator followed by an expression, the expression‘s members are what will be used. In Figure 5.9 the expression is a StringLiteral, in section 5.3.2 literals were defined, this automatically returns the members of a String for the members of a StringLiteral. When this is generated and run the IDE functions as shown. Figure 5.13, Ruby Member Completion Providing members tasks for different non-terminals will extend what can be inferred when dotting off a local variable. <eclass class="LocalVariableRef"> <members> <!--Find what we are refering to and use its members--> <find var="obj" in="scope" quantity="1"> <condition>obj._name == self.name</condition> </find> <annotate>result.members += obj.members</annotate> </members> </eclass> <eclass class="InstanceVariable"> <qualify> <annotate>self._name = self.name</annotate> </qualify> <members> <annotate>result.members += self.tail.members</annotate> </members> </eclass>
  41. 41. P a g e | 35 5.3.5 Advanced Type Inference on External Iterators A particularly interesting and popular feature of Ruby is external iterators; they allow a method call to provide a block of code which should execute when the method being called “yields” data. The concept can be unusual to Ruby newcomers, below is an example which illustrates how it works. Figure 5.14, Ruby External Iterator In Ruby, a numeric range can be defined as X..Y, on a numeric range class is a method called “each”. This method iterates over each number between x and y and yields it. The “sample” method shows how this can be used. When the each method yields data the external iterator will execute and the value of x will be what was yielded. The code would print “1234” if executed. Imagine code dotting off the x external iterator argument. A DLTGen IDE can infer x to be a number and propose members of the Number class. In order to achieve support for this the problem must first be decomposed to understand what language features are interacting, figure 5.14 shows a possible chain of inferences which could support this feature. The ultimate goal is to take the Iterator argument non-terminal and find an equivalent YieldStatement non-terminal. Figure 5.14, External Iterator Inference This can be thought of as three fairly simple find actions as shown below. class NumericRange def each() //Dummy yield to infer numeric data is yielded yield 0; end end def sample() r = 1..4 r.each { |x| print(x) } end
  42. 42. P a g e | 36 Figure 5.15, Yield Resolver First it finds the MethodCall, the external iterator in the grammar is a child of the method call so it can expect it to be a containing feature. The method being referred to can have a complicated path, in the example it is “r.each” therefore it must be processed through the dot typing mechanism. Finally it just needs to find an appropriate yield in the callee. Below is a screenshot which shows the feature running for the example above. Figure 5.16, Inferring Types of Iterator Arguments <members> <!--Find the method i am an arg for--> <find var="caller" in="parents" quantity="1"> <condition>caller.class == MethodCall</condition> </find> <!--Find the actual method def--> <algorithm var="callee" input="caller.path" type="DotTyping"> <attribute id="seperator" value="." /> </algorithm> <!--Find the yield--> <find var="yield" in="callee" quantity="1"> <condition>yield.class == YieldStatement</condition> </find> <annotate>result.members += yield.members</annotate> </members>
  43. 43. P a g e | 37 6 Testing This chapter focuses on the procedures undertaken to verify the stability and robustness of the implementation. Testing was split into two areas, the framework features and the code generator, each requiring different testing strategies. All tests discussed in this chapter and their results can be found in Appendix G. 6.1 Framework As previously discussed, most of DLTGen’s features are implemented as a library referred to as the framework. This framework is made up of several parts which represent the various processes the IDE has to perform at runtime. These include features such as maintaining a model, updating the model, searching the model etc. 6.1.1 Testing Procedure Tests for the framework were derived using a black box method; features were tested rather than individual units and components. Tests were defined to verify DLTGen features such as model management and the various actions. The tests employ a number of typical and non- typical uses to ensure a feature is robust, it would not be tested thoroughly enough if it was only validated in the way DLTGen generates code which uses the feature. Most of the features in the framework are dependent on having source data, the framework cannot build a model or act upon it without a grammar or parsed grammar objects for example. To provide a basis for testing the Ruby grammar was used along with a number of sample ruby scripts. Automated JUnit9 tests were created where the setup for each test establishes this test environment. 6.1.2 Features Tested Four main features which present more complexity than other aspects of this project were tested. These are briefly discussed below.  Model Management, these tests are designed to ensure DLTGen accurately maintains the language models. This includes checking model builders are used correctly; appropriate caching takes place and the life cycle of non-terminals is accurate.  Annotations, it is important that all methods of interacting with non-terminals and Meta data are tested as the generator will use them all, attempting to provide the most efficient mechanism for the context.  Find Action, there are many ways in DLTGen to search and filter the model, many different search sources add to this complexity. For this reason the constructs behind the Find action were chosen to be automatically tested.  Linking Action, although a less used feature it has a reasonable amount of complexity which can benefit from automated testing. 9 http://www.junit.org/
  44. 44. P a g e | 38 6.1.3 Results Although features mostly worked as expected some problems did arise and were fixed. There were issues regarding security. Given that an Xtext and DLTGen project make use of multiple Eclipse projects there are security concerns when using features such as reflection. This was solved by providing interfaces for the project in the correct security domain to implement. The local project did security sensitive work and passed the results. An example of this is creating an instance of a MetaEObject. These were not accessible outside of the Xtext project therefore the model loader provides a factory for this. Another issue which arose from testing was the difficulty in debugging a particular feature. Many of the constructs in DLTGen have several potential failure points but the correct one was never made obvious to the user. To aid this, a debug flag was added to the Model class. When this flag is turned on extra debug information will appear in the console. Imagine the dot typing algorithm; it could fail on any part of the dot separated path. The improvements made to the system shows the routes taken. Similarly, a debug action was added to the specification. This allows the user to create their own debug prints from within the specification. This helps users print the state of non-terminals and the model to debug features. 6.2 Generator The job of the generator is to take the specification and turn it into java code, in doing this it has to perform a number of tasks, some of which will warrant detailed testing. While processing a specification the generator needs to detect erroneous or nonsensical specification and report appropriately to the user. The generator also needs to create efficient, accurate and safe Java code. This is more complicated than it may seem and involves keeping track of previously generated code and the context. This rest of this section will discuss how some of these issues were dealt with. 6.2.1 Methodology Similar to the testing of the runtime framework, the generator cannot operate without a grammar; once again the Ruby grammar will be used to perform tests. For tests which validate erroneous specification a project was created called “dltgen_tests” which defines a range of erroneous specification. It would be too complicated to integrate specification generation testing into JUnit tests as it is so heavily built into the Xpand2 engine, therefor these must be executed manually. More atomic features of the generator such as the path resolver were tested with JUnit testing. 6.2.2 Features Tested Below is a brief overview of what generator features were involved in testing the generator.  Path Resolver, DLTGen does not dumbly swap out text when it generates code, it attempts to understand the statements it is given to generate the most efficient constructs. This is the task of the Path Resolver; it keeps track of what is known about variables as to most efficiently generate code.
  45. 45. P a g e | 39  Path Safety Checker, It would be too buggy to simply generate complicated statements without considering their safety. Because of this the generator has a feature which validates how safe generated paths are and wraps them in constructs to make them safe. This component is important to generating reliable code and warrants automated testing.  Erroneous Specification, The generator reports back a number of errors which can be detected in the specification, to derive the errors and ensure they are caught a range of test specifications were created. 6.2.3 Results The key problem found by testing the generator was how it sometimes mistakenly did not recognise attributes of a non-terminal child types. The mechanism for looking at this was observing the ECore model; however, attributes could be primitive types or Strings which are obviously not defined in the ECore model. As a result, the generator path resolver was re- written to use reflection, this mechanism proved much more flexible as it would resolve paths accurately based on their type. Other minor issues presented themselves from testing the generator such as certain nullables not being null checked. Issues like this were resolved as they were found.

×