Generic Programming


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Generic Programming

  1. 1. An Introduction to Generic Programming Motivation Advances in programming languages are influenced by various factors. One of the major driving forces in programming language design, particularly for imperative languages like Fortran and C, was efficiency. With software engineering becoming more mature, computers becoming increasingly powerful, and software becoming more complex, efficiency has ceased to play a central role in language design. There are other important design criterion like better abstraction and reusability that are becoming more influential. One of the most important influencing factors in language design is the desire to abstract more details to take the language farther from the machine details. Assembly languages abstracted the achine operations and control flow with operation codes and labels. Imperative languages abstracted the machine level details with more structured control flow and data-types. Various programming paradigms evolved later providing different ways of abstraction. For example, object oriented languages abstracted data and provided higher-level modeling of relationship between the ADTs (Abstract Data Types). Generic programming tends to abstract the type details and helps us move towards still higher levels of abstraction where the whole program is viewed as various components glued together. Another significant factor influencing the design and evolution of contemporary programming languages is to provide better facilities for writing reusable code. One of the most popular ways of providing reusable code so far is the support for creating libraries, with languages themselves providing significant portion of the core facilities common for usual programming tasks. Object oriented programming languages provided support for creating another form of reusable code: frameworks. Polymorphism and inheritance are the features providing a solid foundation for creating such reusable code. However, frameworks suffer from significant limitations, which restricts its widespread use (which is not discussed here). Component programming aspires to provide highly reusable components that can be just plugged together to make them work. The trend in programming language design is clear: languages are slowly evolving towards providing better facilities for abstraction and reusability. Though generic programming facilities are present in many important languages, it has started becoming sophisticated and gaining increasing importance only recently. Though it has been few decades since generic programming facilities were introduced in experimental basis in languages, it is only in past few years it has started becoming mature and sophisticated. The reason for much of the recent popularity and attention towards generic programming is due to C++ in which significant support for generic programming is provided through templates, and generic programming has become an important programming paradigm supported by C++. Most of the C++ libraries are now almost entirely written using templates, signifying the importance of the role that template feature plays in C++. What is Generic Programming? According to [Czarnecki and Eisenecker, 2000], Generic programming is a subdiscipline of computer science that deals with finding abstract representations of efficient algorithms, data structures, and other software concepts, and with their systematic organization… Generic programming focuses on representing families of domain
  2. 2. concepts. There is no universally accepted definition for generic programming. We refer to generic programming as the ability to write reusable, independent programming units that can be plugged together by writing glue code. This article uses the term generic programming in this context. Component programming is often used interchangeably in the context of the ability to create pluggable, reusable components. This article doesn’t use the term component programming since this term is widely used to refer to the technologies like COM, CORBA and EJB, whose objectives are similar to that of generic programming, but are essentially quite different in spirit. For example, when you use the term component in the context of any of these technologies, it is quite different from what you intend to communicate when you refer to generic units in the context of parametric polymorphism. Theoretical Basics Static and Dynamic Typing Based on the type systems, languages can be classified in to two main categories: statically and dynamically typed languages. Statically typed languages do type checking at compile-time to prevent type errors. Dynamically typed languages postpone the type checking check to runtime. The main advantage in following static typing is that the type errors can be caught at compile-time itself, thus, earlier in the programming lifecycle, and hence easier to fix. Most of the high-level languages are statically typed languages and are of widespread use because of this property (for example, C++ and Java). On the other hand, dynamically typed languages offer flexibility, particularly when the types of the data are not known in advance. Hence dynamically typed languages advocate Rapid Application Development (RAD); Smalltalk and Python are examples of languages following dynamic typing. Generics and Inheritance Those who are acquainted with object-oriented languages might have a misconception that inheritance is a substitute for generic units. For example, in a language supporting a universal base class Object, and inheritance, a container for storing key and value pairs can be provided as a map container with two handles to Objects. A property container, which maps two strings might be viewed as a derived class of map. With a language supporting generic types, a map might be viewed as a container type parameterized by type T. A property container might be viewed as a specialization of map with string type substituted for parametric type T for both key and value. Such common solutions to similar problems might mislead a novice programmer that these two features can be used interchangeably. While it is true that generics and inheritance can be used interchangeably in few cases, in fact the features are complementary in nature. “Parameterized types give us a third way (in addition to class inheritance and object composition) to compose behavior in object-oriented systems ”, notes [Gamma et al, 1995]. Also, it is possible to mix genericity and inheritance, which is a powerful concept. This idea is clearly articulated in a classic paper [Meyers, 1986]. Two Models for Generic Programming There are two popular and well-known means for creating generic components. In statically typed languages, it is through parametric polymorphism and in dynamically typed languages it is through dynamic typing itself.
  3. 3. In statically typed languages, instead of writing the same code for different types, it is possible to parameterize the type information and generate code for different types as needed. In this way, it is possible to write programming components that are 'generic' enough to work on different types. Statically typed languages such as ML, Ada, C++ and Java support this parameteric polymorphism, with C++ being the language providing most comprehensive support for generic programming through templates. Languages such as Java provide limited support for parametric polymorphism, mainly to support type-safety in using heterogeneous containers. // C++ example: // addition as a template function. template <typename T> T plus (T arg1, T arg2) { return (arg1 + arg2); } // sample usage int i = plus(10, 20); float fval = 20.0f; float f = plus(10.0f, fval); In type parameterization model, the type information associated with code or object is abstracted as a code template. This code template used to generate (quot;instantiatequot;) type specific code as and when required. In this way, the type specific code can potentially be generated for an infinite set of types. In other words, this approach parameterizes the type information of the code or object, and hence referred to as type parameterization. Type parameterization is also referred to as parametric polymorphism or generics. In dynamically typed languages, it is natural to write generic components, since the methods are invoked based on the dynamic type of the object. Components can be written independent of the type: if the receiver object understands the message passed, it will respond to the message accordingly (otherwise, it might fail. But dynamically typed languages provide different means to handle the scenario where the receiver object does not understand the message; for example, in Smalltalk, it results in doesNotUnderstand method (and Smalltalk also provides the facility to forward that message to some other message). An important observation in these two approaches is that they are complementary in nature. Parametric polymorphism is possible only when the language follows static typing, and it is not possible to apply it for dynamically typed languages (since the type checking is postponed to runtime instead of performing at compile-time). For example, it is not possible to introduce parametric polymorphism in Smalltalk, which is a dynamically typed language (in other words, parameterizing classes based on type doesn’t hold any meaning in Smalltalk). For the same reason, a language supporting parametric polymorphism cannot have dynamic typing. For example, it is not possible to have dynamic typing supported in C++, which is a statically typed language supporting parametric polymorphism (in other words, to parameterize classes based on type, type checking has to be done at compile time). There are many object-oriented languages that have a common universal base class (typically named as the Object base class) is used for representing the operations that are universally applicable to all
  4. 4. objects. This allows common operations to be applied to objects irrespective of the dynamic type of the object. Data structures and algorithms can be written based on the operations on Object class, and thus, can be applied to any object. This model naturally suits to dynamically typed languages like Smalltalk. In statically typed languages, manipulating the objects and applying mutating operations are not elegant, as the object needs to be casted back to its dynamic type (for example in Java, prior to the addition of generics in its 1.5 version). Examples of generic programming using universal container model include Smalltalk, Objective-C and .NET class framework. Universal container model uses inheritance and runtime polymorphism as the basis for supporting genericity. Scripting Languages and Generic Programming Scripting languages have played a major role in abstracting the low-level details from the programmer, by providing high-level facilities to 'glue' the sub-systems together. It is in fact surprising to note that the experience gained in the design of scripting languages can be used in generic programming. This is because of the similar role being played. Scripts are used for rapid prototyping and development, as a means of writing glue code. A similar role is played in generic programming where there are components available for different uses, and they need to be glued together. Thus, dynamic typing, as used in scripting is a sound proof that it is much suitable to be used for generic programming. Dynamic typing is of significant importance in presence of subsystems when glue code need to be connected together. This is one of the significant characteristics of scripting languages where its eco- system consists of many subsystems. Such subsystems are glued together using scripts. The subsystems might be written in different languages that gluing them together would need a flexible type system. That is the reason why many of the scripting languages are dynamically typed (for example, Perl and Python). Functional Languages and Generic Programming Many functional languages provide support for generic programming. For example, ML and Haskell support parametric polymorphism. One of the prime advantages with functional languages is that the functional programs are written free of side effects, and the programs tend to be more modular since the whole program is made up of functions (many of them are reusable ones provided in the language libraries). Also, most of the functions are naturally parameterized by type, and hence parametric polymorphism fits neatly into generic programming. Design Patterns and Generic Programming Design patterns, as described in [Gamma et al, 1995], “are descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context”. Design patters have received wide attention in last decade as a means of writing and promoting reusable code in object- oriented programs. However, generic programming is not about object-oriented programming itself, and many of the design patterns can be expressed using parametric polymorphism (for example, the Bridge design pattern). Note that, the word template in Template Method design pattern [Gamma et al, 1995] can be misleading. This design pattern is nothing to do with templates (as in C++, which supports this feature for generic programming). It is about inheritance where specific operations in a class are deferred to be implemented in the subclasses. Here, a template method in a class refers to an abstract method that is overridden in the subclasses to provide a concrete behavior. Thus, it is runtime polymorphism, which is one of the fundamental ways to enable code reuse. The template refers to the fact that the varying part of the algorithm is abstracted to later stages to provide flexibility in implementation and in refractoring. Unix Pipes and Generic Programming
  5. 5. The use of pipes in Unix environment shows us that it is possible to glue independently designed and developed sub-systems together in a rather surprisingly simple way. Many of the tools in Unix environment, such as wc (word count), which seem to be not of much use when taken separately becomes very valuable when combined with other similar small tools to solve non-trivial problems that would require non-trivial programming effort. This is possible because of the clearly defined simple communication mechanism established for input and output, and the components designed so that each of them do only a specific, well-defined task. The objectives are not much different from the components in generic programming where the components are supposed to solve specific problems and it is expected that the programmer will combine them together to solve his problem. Implementation Issues and Details Efficiency Concerns In languages supporting parametric polymorphism, the overhead involved in providing generic units is minimal (or sometimes no overhead at all). The definition of the generic type might not itself result in generation of code at all (as in C++); when instance of a generic unit for a particular type is needed, the type parameter is substituted for the actual type and a specialized form is created (known as specialization or instantiation). This specialized code is not anyway different from the ordinary class or function in spirit, and hence parametric polymorphism can be implemented (as it is typically done) without any significant runtime overheads or efficiency loss. Dynamically typed languages need some means to keep track of the types of variables and expressions at runtime, so that type safety can be ensured. This requires tracking the runtime type information of objects (known as tagging) so that runtime checks can be made. For every operation performed, type safety has to be ensured at runtime and hence runtime checks are performed while the program is executed. Thus the flexibility offered by dynamic typing incurs cost both in terms of space (for tagging the objects) and time (for performing runtime checks); both of them are avoided in statically typed languages. Code Generation In parametric polymorphism, code is generated for specific types from the generic units whenever a specialized version is needed. Thus the generated code size for the program tends to be more. However, for dynamic typing, it is enough to have one static type that will handle different dynamic types through runtime type identification (RTTI) and hence the code needs to be generated only for that. So, the generated code size for the program tends to be small. The compiled code can be directly plugged for various types, and hence it is enough to distribute the compiled code in libraries. Parametric polymorphism tends to expose the source code, though it is not the property of parameterization itself. For example, though C++ language supports export feature that can avoid exposing the template source code, C++ implementations generally require exposing the template source code, which is not preferable for writing commercial libraries. Language Implementation Details In C++, the template feature is a compile-time mechanism; hence there is practically no extra cost (in terms of space and time) paid for using templates. The generic units in Ada also have little or no runtime overheads since simple type substitution is done for parametric types. In .NET platform, two different mechanisms for instantiating templates are supported. The template
  6. 6. instantiations can either be done at compile-time or at runtime. In Managed Extensions to C++, the C++ template features are supported by compile time instantiation mechanism. The Common Language Runtime (CLR) for .NET supports the generics feature for type safety (for languages like C#), and for that the template instantiations are done at runtime. The generic programming facilities offered are different (mostly in semantics), but there is little or no cost involved for both the instantiation models. In Java, the purpose for genericity is not for abstracting the type details, but to add type safety to the use of common base class approach (and hence the parametric polymorphism is essentially cosmetic in nature). Because of this reason, the parameterization information is lost after compile-time checking is done (this also means that the runtime doesn’t really understand or provide facilities for generics, and hence no extra cost is involved in supporting generics). Note that it is natural to implement parametric polymorphism as a compile-time mechanism, and in fact, in most of the languages (and their implementations) supporting parametric types, they are treated at compile-time. However, it is equally possible that they might be supported at runtime. In case of functional languages like Haskell, since the programs using polymorphic functions can be huge, it would be costly for the implementations to create separate copies of polymorphic functions (and types). Thus, the implementations may choose to keep pointers to the polymorphic object types. However, in such an implementation, since the dynamic type of the object pointed might be of any possible type, so the operations that might be applied to the dynamic type are restricted (unless the runtime provides necessary dynamic type information). Also, object allocation and deallocation in heap incurs a cost. However, the advantage with this approach is that the code size for a polymorphic type/function will not be different from a monomorphic type/function, and hence the implementation is clean and uniform. In Smalltalk, all the methods are dynamically resolved and the response from the receiver of the message depends on the dynamic type of the object. Also, the type checking is done at runtime. Though Smalltalk implementations typically optimize the general Smalltalk programming constructs and idioms to avoid unnecessary overhead, still there is significant overhead involved, as in other dynamically typed languages. Efficiency is one of the main reasons that inhibited the widespread use and acceptance of Smalltalk, which is otherwise an excellent language for writing reusable components. [Czarnecki and Eisenecker, 2000] Krzysztof Czarnecki, Ulrich W. Eisenecker, “Generative Programming: Methods, Tools, and Applications”, Addison-Wesley, Reading, MA, 2000. [Gamma et al, 1995] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, “Design Patterns: Elements of Reusable Object-Oriented Software”, Addison-Wesley, Reading, MA, 1995. [Meyer, 1986] Bertrand Meyer, “Genericity versus Inheritance”, Conference proceedings on Object-oriented programming systems, languages and applications, p.391-405, September 29-October 02, 1986, Portland, Oregon, United States. All rights reserved. Copyright Jan 2004.