Your SlideShare is downloading. ×
Re-Engineering Databases using Meta-Programming Technology
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Re-Engineering Databases using Meta-Programming Technology

1,245
views

Published on

G N Wikramanayake (1997) "Re-engineering Databases using Meta-Programming Technology" In:16th National Information Technology Conference on Information Technology for Better Quality of Life Edited …

G N Wikramanayake (1997) "Re-engineering Databases using Meta-Programming Technology" In:16th National Information Technology Conference on Information Technology for Better Quality of Life Edited by:R. Ganepola et al. pp. 1-14. Computer Society of Sri Lanka, Colombo: CSSL Jul 11-13, ISBN 955-9155-05-9

Published in: Education

1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
1,245
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. RE-ENGINEERING DATABASES USING META-PROGRAMMING TECHNOLOGY G.N. Wikramanayake Department of Statistics and Computer Science, University of Colombo. gihanw@hadawatha.cmb.ac.lk Key words: database re-engineering and migration, meta-programming and legacy systems. ABSTRACT A wealth of information is held in databases supporting the IT capabilities of organisations. Many of these databases are called legacy databases in that they and their associated applications were developed with software systems that are now technologically obsolete, particularly when compared with the more recent systems being used for new developments in the organisation. These legacy databases need to be evolved and migrated to modern computing environments, so that their existence remains beneficial to their community of users. The evolution path of these databases is based on a re-engineering process. The Conceptualised Constraint Visualisation and Enhancement System (CCVES) for relational legacy databases, developed at Cardiff, is a database software tool that assists with the migration process of legacy databases, and its re- engineering databases using meta-programming technology is described here. This tool is initially used to create a graphical model of a relational legacy database which shows its current integrity constraints applicable to the elements of the model. CCVES was developed using meta- translation techniques and can accept input from a variety of relational systems (INGRES, Oracle and POSTGRES have been tested) to produce graphical models of a database’s schema as either as ER model or an OMT model accompanied by a display of the integrity constraints in force in the database. CCVES can also be used to enhance the legacy database by accepting input of explicit new constraints which the underlying database does not support or which are incomplete in the database in that they should be enforced. This enables further constraints to appear in the graphical model. These constraints are used to enhance the original database’s meta-data model; and to assist legacy databases to be evolved and viewed in new ways. These constraints can also be used to detect inconsistent legacy data prior to its migration from its current database and help in the transparent migration of legacy databases which permits users to continue using them while they migrate. Meta-programming technology have been successfully used in several recent research projects to address heterogeneity issues. A key to this approach is the transformation of the source meta-data or query into a common internal representation which is then separately transformed into a chosen target representation. Thus components of a schema, referred to as meta-data, are classified as entity and attribute on input, and are stored in a database language independent fashion in the internal representation. This meta-data is then processed to derive the appropriate schema information of a particular DBMS. In this way it is possible to use a single representation and yet deal with issues related to most types of DBMSs. A similar approach is used for query transformation between source and target representations. 16th National IT Conference, Sri Lanka, 11-13 July 1997 1
  • 2. 1. INTRODUCTION technology itself limits them from being adapted to meet the changing business needs Over the years rapid technological changes catalysed by new technology. The older have taken place in all fields of computing. systems which have been developed using Most of these changes have been due to the 3GLs and in operation for many years, often advances in data communications, computer suffer from failures, inappropriate hardware and software which together have functionality, lack of documentation, poor provided a reliable and powerful networking performance and are referred to as legacy environment (i.e. standard local and wide area information systems. networks) that allow the management of data stored in computing facilities at many nodes The current technology is much more flexible of the network. These changes have turned as it supports methods to evolve (e.g. 4GLs, round the hardware technology from CASE tools, GUI toolkits and reusable centralised mainframes to networked file- software libraries), and can share resources server and client-server architectures which through software that allows interoperability support various ways to use and share data. (e.g. ODBC). This evolution reflects the changing business needs. However, modern Simultaneous developments in the software systems need to be properly designed and industry have produced techniques (e.g. for implemented to benefit from this technology, system design and development) and products which may still be unable to prevent such capable of utilising the new hardware systems themselves being considered to be resources (e.g. multi-user environments with legacy information systems in the near future GUIs). These new developments are being due to the advent of the next generation of used for a wide variety of applications, technology with its own special features. The including modern distributed information only salvation would appear to be building in processing applications, such as office evolution paths in the current systems. This automation where users can create and use will ensure that any attempts to incorporate databases with forms and reports with minimal the modern technology will not adversely effort, compared to the development efforts affect the ongoing functionality of existing using 3GLs. Such applications are being systems. developed with the aid of database technology as this field too has advanced by allowing Re-engineering of legacy databases using users to represent and manipulate advanced meta-programming technology in such a way forms of data and their functionalities. Due to that the process is transparent to the current the program data independence feature of users has proved to be a successful. This paper DBMSs the maintenance of database describes the benefits of this technology to the application programs had become easier as very important application areas of enhancing functionalities that were traditionally and evolving heterogeneous distributed legacy performed by procedural application routines databases to assist the legacy database are now supported declaratively using migration process. The role of meta- database concepts such as constraints and programming in this context is described here rules. by considering the complementary roles of schema and query meta-translation systems, In the field of databases, the recent advances the schema meta-visualisation system and resulting from technological transformation schema meta-integration system. include many areas such as the use of distributed database technology, object- The rest of the paper is organised as follows. oriented-oriented technology, constraints, Section 2 identifies the re-engineering of knowledge-based systems, 4GLs and CASE databases with special emphasis on the tools. Meanwhile, the older technology was relational model. This is followed by an dealing with files and primitive database overview of the meta-programming systems which now appear inflexible, as the technology. Three main stages in the 2 16th National IT Conference, Sri Lanka, 11-13 July 1997
  • 3. application of our system are described next. • Determine keys, e.g. primary keys, The role of our system in context of meta- candidate keys and foreign keys. programming technology is then described. • Determine entity and relationship types. Finally, we complete the paper by identifying • Construct suitable data abstractions, such some of our experiences and drawing as generalisation and aggregation conclusions. structures. 2.1 Contents of a relational database 2. RE-ENGINEERING DATABASES Diverse sources provide information that leads Software such as programming code and to the identification of a database’s contents. databases is re-engineered for a number of These include the database’s schema, reasons: for example, to allow reuse of past observed patterns of data, semantic development efforts, reduce maintenance understanding of application and user expense and improve software flexibility 11. manuals. Among these the most informative This re-engineering process consists of two source is the database’s schema, which can be stages, namely: a reverse-engineering and a extracted from the data dictionary of a DBMS. forward-engineering process. In database The observed patterns of data usually provide migration the reverse-engineering process may information such as possible key fields, be applied to help migrate databases between domain ranges and the related data elements. different vendor implementations of a This source of information is usually not particular database paradigm (e.g. from reliable as invalid, inconsistent, and Informix to Oracle), between different incomplete data exists in most legacy versions of a particular DBMS (e.g. Oracle applications. The reliability can be increased version 3 to Oracle version 7) and between by using the semantics of an application. The database types (e.g. hierarchical to modern availability of user manuals for a legacy IS is relational database systems). The forward- rare and they are usually out of date, which engineering process, which is the second stage means they provide little or no useful of re-engineering, is performed on the information to this search. conceptual model derived from the original reverse-engineering process. At this stage, the Data dictionaries of relational databases store objective is to redesign and / or enhance an information about relations, attributes of existing database system with missing and / or relations, and rapid data access paths of an new information. application. Modern relational databases record additional information, such as primary The application of reverse-engineering to and foreign keys (e.g. Oracle), rules / relational databases has been widely described constraints on relations (e.g. INGRES, and applied 2-4, 8-11, 18. The latest approaches POSTGRES, Oracle) and generalisation have been extended to construct a higher level hierarchies (e.g. POSTGRES). Hence, analysis of abstraction than the original E-R model. of the data dictionaries of relational databases This includes the representation of object- provides the basic elements of a database oriented concepts such as generalisation / schema, i.e. entities, their attributes, and specialisation hierarchies in a reversed- sometimes the keys and constraints, which are engineered conceptual model. then used to discover the entity and relationship types that represent the basic The techniques used in the reverse- components of a conceptual model for the engineering process consist of identifying application. The trend is for each new product common characteristics as identified below: release to support more sophisticated facilities for representing knowledge about the data. • Identify the database’s contents such as relations and attributes of relations. 16th National IT Conference, Sri Lanka, 11-13 July 1997 3
  • 4. 2.2 Keys of relational data model between two entities will have the same attribute names in the entities involved. This Theoretically, three types of key are specified naming convention was used in 2 to determine in a relational data model. They are primary, relationship types, as foreign key candidate and foreign keys. Early relational specifications are not supported by all DBMSs were not capable of implicitly databases. An important contribution of our representing these. However, sometimes work is to support the identification of foreign indexes which are used for rapid data access key specifications for any database and hence can be used as a clue to determine some keys the detection of relationships, without of an application database. For instance, the performing any name conversions. We note analysis of the unique index keys of a that some reverse-engineering methods rely on relational database provides sufficient candidate keys (e.g. 8, 10), while others rely on information to determine possible primary or primary keys (e.g. 2). These approaches insist candidate keys of an application. The on their users meeting their pre-requisites (e.g. observed attribute names and data patterns specification of missing keys) to enable the may also be used to assist this process. This user to successfully apply their reverse- includes attribute names ending with ‘#’ or engineering process. This means it is not ‘no’ as possible candidate keys, and attributes possible to produce a suitable conceptual in different relations having the same name for model until the pre-requisites are supplied. For possible foreign key attributes. In the latter a large legacy database application the number case, we need to consider homonyms to of these could exceed a hundred and hence, it eliminate incorrect detections and synonyms is not appropriate to rely on such pre- to prevent any omissions due to the use of requisites being met to derive an initial different names for the same purpose. Such conceptual model. Therefore, we concentrate attributes may need to be further verified on providing an initial conceptual model using using the data elements of the database. This only the available information. This will includes explicit checks on data for validity of ensure that the reverse-engineering process uniqueness and referential integrity properties. will not fail due to the absence of any vital However the reverse of this process, i.e. information (e.g. the key specification for an determining a uniqueness property from the entity). data values in the extensional database is not a reliable source of information, as the data 2.3 Entity and Relationship Types of a itself is usually not complete (i.e. it may not data model contain all possible values) and may not be fully accurate. Hence we do not use this In the context of an E-R model an entity is process although it has been used in 2, 11. classified as strong (regular) or weak The lack of information on keys in some depending on an existence-dependent property existing database specifications has led to the of the entity. A weak entity cannot exist use of data instances to derive possible keys. without the entity it is dependent on. The However it is not practicable to automate this enhanced E-R model (EER) 5 identifies more process as some entities have keys consisting entity types, namely: composite, generalised of multiple attributes. This means many and specialised entities. Different permutations would have to be considered to classifications of entities are due to their test for all possibilities. This is an expensive associative properties with other entities. The operation when the volume of data and / or the identification of an appropriate entity type for number of attributes is large. each entity will assist in constructing a graphically informative conceptual model for In 2, a consistent naming convention is applied its users. The extraction of information from to key attributes. Here attributes used to legacy systems to classify the appropriate represent the same information must have the entity type is a difficult task as such same name, and as a result referencing and information is usually lost during an referenced attributes of a binary relationship implementation. This is because 4 16th National IT Conference, Sri Lanka, 11-13 July 1997
  • 5. implementations take different forms even abstractions can only be introduced either by within a particular data model 5. Hence, an introducing them without affecting the information extraction process may need to existing data structures or by transforming interact with a user to determine some of the existing entities and relationships to support entity and relationship types. The type of their representation. For example, entities interaction required depends on the Staff and Student may be transformed to information available for processing and will represent a generalisation structure by take different forms. For this reason we focus introducing a Person entity. only on our approach, i.e. determining entity and relationship types using enhanced Other forms of transformation can also be knowledge such as primary and foreign key performed. These include decomposing all n- information. ary relationships for n > 3 into their constituent relationships of order 2 to remove 2.4 Suitable Data Abstractions for a data such relationships and hence simplify the model association among their entities. At this stage double buried relationships are identified and Entities and relationships form the basic merged and relationships formed with components of a conceptual data model. These subclasses are eliminated. Transitive closure components describe specific structures of a relationships are also identified and changed data model. A collection of entities may be to form simplified hierarchies. We use used to represent more than one data structure. constraints to determine relationships and For example, entities Person and Student may hierarchies. By controlling these constraints be represented as a 1:1 relationship or as a is-a (i.e. modifying or deleting them) it is possible relationship. Each representation has its own to transform or eliminate necessary view and hence the user understanding of the relationships and hierarchies. data model will differ with the choice of data structure. Hence it is important to be able to introduce any data structure for a conceptual 3. META-PROGRAMMING model and view using the most suitable data TECHNOLOGY abstraction. Meta-programming technology allows the Data structures such as generalisation and meta-data (schema information) of a database aggregation have inherent behavioural to be held and processed independently of its properties which give additional information source specification language. This allows us about their participating entities (e.g. an to work on a database language independent instance of a specialised entity of a environment and hence overcome many generalisation hierarchy is made up from an logical heterogeneity issues. Prolog based instance of its generalised entity). These meta-programming technology has been used structures are specialised relationships and in previous research at Cardiff in the area of representation of them in a conceptual model logical heterogeneity 6, 14. Using this provides a higher level of data abstraction and technology the meta-translation of database a better user understanding than the basic E-R query languages 7 and database schemas 15 has data model gives. These data abstractions been performed. This work has shown how the originated in the object-oriented data model heterogeneity issues of different DBMSs can and they are not implicitly represented in be addressed without having to reprogram the existing relational DBMSs. Extended- same functionality for each and every DBMS. relational DBMSs support the O-O paradigm We use meta-programming technology for our (e.g. POSTGRES) with generalisation legacy database migration approach as we structures being created using inheritance need to be able to start with a legacy source definitions on entities. However in the context database and end with a modern target of legacy DBMSs such information is not database where the respective database normally available, and as a result such data schema and query languages may be different 16th National IT Conference, Sri Lanka, 11-13 July 1997 5
  • 6. from each other. In this approach the source applied experimentally to the legacy database database schema or query language is mapped to determine the extent to which it conforms to on input into an internal canonical form. All them. This process is done at stage 3 (cf. paths the required processing is then done using the C-1 and C-2 of figure i). The user can then information held in this internal form. This decide whether these constraints should be information is finally mapped to the target enforced to improve the quality of the legacy schema or query language to produce the database prior to its migration. At this point desired output. The advantage of this approach the three preparatory stages in the application is that processing is not affected by of our approach are complete. The actual heterogeneity as it is always performed on migration process is then performed. All data held in the canonical form. This stages are further described below to enable us canonical form is an enriched collection of to identify the main processing components of semantic data modelling features. our proposed system as well as to explain how we deal with different levels of heterogeneity. 4. APPLICATION 4.1 Stage 1: Reverse Engineering We view our re-engineering approach as In stage 1, the data definition of the selected consisting of 3 stages. At stage 1, the data database is reverse-engineered to produce a definition of the selected database is reverse- graphical display of the database. To perform engineered to produce a graphical display (cf. this task, the database’s meta-data must be paths A-1 and A-2 of figure i). However, in extracted (cf. path A-1 of figure i). This is legacy systems much of the information achieved by connecting directly to the needed to present the database schema in this heterogeneous database. The accessed meta- way is not available as part of the database data needs to be represented using our internal meta-data and hence these links which are form. This is achieved through a schema present in the database cannot be shown in mapping process as used in the SMTS this conceptual model. In modern systems (Schema Meta-Translation System) of Ramfos 15 such links can be identified using constraint . The meta-data in our internal formalism specifications. Thus, if the database does not then needs to be processed to derive the have any explicit constraints, or it does but graphical constructs present for the database these are incomplete, new knowledge about concerned (cf. path A-2 of figure i). These the database needs to be entered at stage 2 (cf. constructs are in the form of entity types and path B-1 of figure i), which will then be the relationships and their derivation process reflected in the enhanced schema appearing in is the main processing component in stage 1. the graphical display (cf. path B-2 of figure i). The identified graphical constructs are This enhancement will identify new links that mapped to a display description language to should be present for the database concerned. produce a graphical display of the database. These new database constraints can next be 6 16th National IT Conference, Sri Lanka, 11-13 July 1997
  • 7. Schema Enhanced Visualisation Enforced Constraints (EER or OMT) Constraints with Constraints B-1 C-1 B-2 A-2 Internal Processing B-3 C-2 A-1 Heterogeneous Databases Stage 1 (Reverse Engineering) Stage 2 (Knowledge Augmentation) Stage 3 (Constraint Enforcement) Figure i: Information flow in the 3 stages of our approach prior to migration a) Database connectivity for point for the meta-translation process as in heterogeneous database access previous Cardiff systems 12, 15,. We found that it is not essential to produce such a textual Unlike the previous Cardiff meta-translation file, as the required intermediate systems 7, 12, 15, which addressed heterogeneity representation can be directly produced by the at the logical and data management levels, our database access process. This means that we system looks at the physical level as well. could also by-pass the meta-translation While these previous systems processed process that performs the analysis of the DDL schemas in textual form and did not access text to translate it into the intermediate actual databases to extract their DDL representation. However the DDL formalism specification, our system addresses physical of the schema can be used for optional textual heterogeneity by accessing databases running viewing and could also serve as the starting on different hardware / software platforms point for other tools (e.g. The Schema Meta- (e.g. computer systems, operating systems, Integration System (SMIS) of Qutaishat 12.) DBMSs and network protocols). Our aim is to developed at Cardiff for meta-programming directly access the meta-data of a given database applications. database application by specifying its name, the name and version of the host DBMS, and The initial functionality of the Stage 1 the address of the host machine (we assume database connectivity process is to access a that access privileges for this host machine heterogeneous database and supply the and DBMS have been granted). If this accessed meta-data as input to our schema database access process can produce a meta-translator (SMTS). This module needs to description of the database in DDL formalism, deal with heterogeneity at the physical and then this textual file is used as the starting data management levels. We achieve this by 16th National IT Conference, Sri Lanka, 11-13 July 1997 7
  • 8. using DML commands of the specific DBMS Graphical data models of schemas employ a to extract the required meta-data held in set of data modelling concepts and a language- database data dictionaries treated like user independent graphical notation (e.g. the Entity defined tables. Relationship (E-R) model, Extended/Enhanced Entity Relationship Relatively recently, the functionalities of a (EER) model 5 or the Object Modelling heterogeneous database access process have Technique (OMT) 17). In a heterogeneous been provided by means of drivers such as environment different users may prefer ODBC 16. Use of such drivers will allow different graphical models, and an access to any database supported by them and understanding of the database structure and hence obviate the need to develop specialised architecture beyond that given by the tools for each database type as happened in traditional entities and their properties. our case. These driver products were not Therefore, there is a need to produce graphical available when we undertook this stage of our models of a database’s schema using different work. graphical notations such as either E-R/EER or OMT, and to accompany them with additional b) Schema meta-translation information such as a display of the integrity constraints in force in the database 18. The The schema meta-translation process 15 display of integrity constraints allows users to accepts input of any database schema look at intra- and inter-object constraints and irrespective of its DDL and features. The gain a better understanding of domain information captured during this process is restrictions applicable to particular entities. represented internally to enable it to be Current reverse engineering tools do not mapped from one database schema to another support this type of display. or to further process and supply information to other modules such as the schema meta- The generated graphical constructs are held visualisation system (SMVS) 13 and the query internally in a similar form to the meta-data of meta-translation system (QMTS) 7. Thus, the the database schema. Hence using a schema use of an internal canonical form for meta meta visualisation process (SMVS) it is representation has successfully accommodated possible to map the internally held graphical heterogeneity at the data management and constructs into appropriate graphical symbols logical levels. and coordinates for the graphical display of the schema. This approach has a similarity to c) Schema meta-visualisation the SMTS, the main difference being that the output is graphical rather than textual. Schema visualisation using graphical notation and diagrams has proved to be an important 4.2 Stage 2: Knowledge Augmentation step in a number of applications, e.g. during the initial stages of the database design In a heterogeneous distributed database process; for database maintenance; for environment, evolution is expected, especially database re-design; for database enhancement; in legacy databases. This evolution can affect for database integration; or for database the schema description and in particular migration; as it gives users a sound schema constraints that are not reflected in the understanding of an existing database’s stage 1 (path A-2) graphical display as they structure in an easily assimilated format 1, 5. may be implicit in applications. Thus our Database users need to see a visual picture of system is designed to accept new constraint their database structure instead of textual specifications (cf. path B-1 of figure i) and descriptions of the defining schema as it is add them to the graphical display (cf. path B-2 easier for them to comprehend a picture. This of figure i) so that these hidden constraints has led to the production of graphical become explicit. representations of schema information, effected by a reverse engineering process. 8 16th National IT Conference, Sri Lanka, 11-13 July 1997
  • 9. The new knowledge accepted at this point is for the same additional knowledge. The used to enhance the schema and is retained in augmented tables are created and maintained the database using a database augmentation in a similar way to user-defined tables, but process (cf. path B-3 of figure i). The new have a special identification to distinguish information is stored in a form that conforms them. Their structure is in line with the with the enhanced target DBMS’s methods of international standards and the newer versions storing such information. This assists the of commercial DBMSs, so that the enhanced subsequent migration stage. database can be easily migrated to either a newer version of the host DBMS or to a a) Schema enhancement different DBMS supporting the latest SQL standards. Migration should then mean that Our system needs to permit a database schema the newer system can enforce the constraints. to be enhanced by specifying new constraints Our approach should also mean that it is easy applicable to the database. This process is to map our tables for holding this information performed via the graphical display. These into the representation used by the target constraints, which are in the form of integrity DBMS even if it is different, as we are constraints (e.g. primary key, foreign key, mapping from a well defined structure. check constraints) and structural components (e.g. inheritance hierarchies, entity Legacy databases that do not support explicit modifications) are specified using a GUI. constraints can be enhanced by using the When they are entered they will appear in the above knowledge augmentation method. This graphical display. requirement is less likely to occur for databases managed by more recent DBMSs as b) Database augmentation they already hold some constraint specification information in their system The input data to enhance a schema provides tables. The direction taken by Oracle version 6 new knowledge about a database. It is was a step towards our augmentation essential to retain this knowledge within the approach, as it allowed the database database itself, if it is to be readily available administrator to specify integrity constraints for any further processing. Typically, this such as primary and foreign keys, but did not information is retained in the knowledge base yet enforce them. The next release of Oracle, of the tool used to capture the input data, so i.e. version 7, implemented this constraint that it can be reused by the same tool. This enforcement process. approach restricts the use of this knowledge by other tools and hence it must be re-entered 4.3 Stage 3: Constraint Enforcement every time the re-engineering process is applied to that database. This makes it harder The enhanced schema can be held in the for the user to gain a consistent understanding database, but the DBMS can only enforce of an application, as different constraints may these constraints if it has the capability to do be specified during two separate re- so. This will not normally be the case in engineering processes. To overcome this legacy systems. In this situation, the new problem, we augment the database itself using constraints may be enforced via a newer the techniques proposed in SQL-3, wherever version of the DBMS or by migrating the possible. When it is not possible to use SQL-3 database to another DBMS supporting structures we store the information in our own constraint enforcement. However, the data augmented table format which is a natural being held in the database may not conform to extension of the SQL-3 approach. the new constraints, and hence existing data may be rejected by the target DBMS in the When a database is augmented using this migration, thus losing data and / or delaying method, the new knowledge is available in the the migration process. To address this problem database itself. Hence, any further re- and to assist the migration process, we provide engineering processes need not make requests an optional constraint enforcement process 16th National IT Conference, Sri Lanka, 11-13 July 1997 9
  • 10. module which can be applied to a database However, we demonstrate how to create and before it is migrated. The objective of this populate a legacy database schema in the process is to give users the facility to ensure desired target environment while showing the that the database conforms to all the enhanced role of SMTS and QMTS in such a process. constraints before migration occurs. This process is optional so that the user can decide whether these constraints should be enforced 5. THE ROLE IN CONTEXT OF to improve the quality of the legacy data prior META-PROGRAMMING to its migration, whether it is best left as it TECHNOLOGY stands, or whether the new constraints are too severe. Our approach described in section 4 is based on preparing a legacy database schema for The constraint definitions in the augmented graceful migration. This involves visualisation schema are employed to perform this task. As of database schemas with constraints and all constraints held have already been enhancing them with constraints to capture internally represented in the form of logical more knowledge. Hence we call our system expressions, these can be used to produce data the Conceptualised Constraint Visualisation manipulation statements suitable for the host and Enhancement System (CCVES). DBMS. Once these statements are produced, they are executed against the current database CCVES has been developed to fit in with the to identify the existence of data violating a previously developed schema (SMTS) 15 and constraint. query (QMTS) 7 meta-translation systems, and the schema meta-visualisation system (SMVS) 13 4.4 Stage 4: Migration Process . This allows us to consider the complementary roles of CCVES, SMTS, The migration process itself is incrementally QMTS and SMVS during Heterogeneous performed by initially creating the target Distributed Database access in a uniform way 6, 14 database and then copying the legacy data . The combined set of tools achieves over to it. The schema meta-translation semantic coordination and promotes (SMTS) technique of Ramfos 15 is used to interoperability in a heterogeneous produce the target database schema. The environment at logical, physical and data legacy data can be copied using the import / management levels. export tools of source and target DBMS or DML statements of the respective DBMSs. Figure ii illustrates the architecture of CCVES During this process, the legacy applications in the context of meta-data processing must continue to function until they too are modules. It outlines in general terms the migrated. To achieve this an interface can be process of accessing a remote (legacy) used to capture and process all database database to perform various database tasks, queries of the legacy applications during such as querying, visualisation, enhancement, migration. This interface can decide how to migration and integration. All these processes process database queries against the current uses the meta-data for their internal process. state of the migration and re-direct those newly related to the target database. The query There are seven sub-processes: the schema meta-translation (QMTS) technique of mapping process 15, query mapping process 7, Howells 7 can be used to convert these queries schema integration process 12, schema to the target DML. This approach will visualisation process 13, database connectivity facilitate transparent migration for legacy process, database enhancement process and databases. Our work does not involve the database migration process. The first two development of an interface to capture and processes together have been called the process all database queries, as interaction Integrated Translation Support Environment 6, with the query interface of the legacy IS is and the first four processes together have been embedded in the legacy application code. called the Meta-Integration/Translation 10 16th National IT Conference, Sri Lanka, 11-13 July 1997
  • 11. Support Environment 12. The last three query mapping process, referred to as QMTS, processes were introduced as CCVES to to generate the required queries to update the perform database enhancement and migration database via the DBC process. At this stage in such an environment. any existing or enhanced constraints may be applied to the database to determine the extent The schema mapping process, referred to as to which it conforms to the new SMTS, translates the definition of a source enhancements. Carrying out this process will schema to a target schema definition (e.g. an also ensure that legacy data will not be INGRES schema to a POSTGRES schema). rejected by the target DBMS due to possible The query mapping process, referred to as violations. Finally, the database migration QMTS, translates a source query to a target process, referred to as DBMI, assists query (e.g. an SQL query to a QUEL query). migration by incrementally migrating the The meta-integration process, referred to as database to the target environment (route C-1 SMIS, tackles heterogeneity at the logical to C-6 in figure ii). Target schema constructs level in a distributed environment containing for each migratable component are produced multiple database schemas (e.g. Ontos and via SMTS, and DDL statements are issued to Exodus local schemas with a POSTGRES the target DBMS to create the new database global schema) - it integrates the local schema. The data for these migrated tables are schemas to create the global schema. The extracted by instructing the source DBMS to meta-visualisation process, referred to as export the source data to the target database SMVS, generates a graphical representation of via QMTS. Here too, the queries which a schema. The remaining three processes, implement this export are issued to the DBMS namely: database connectivity, enhancement via the DBC process. and migration with their associated processes, namely: SMVS, SMTS and QMTS, are the 6. EXPERIENCES AND subject of the present thesis, as they together CONCLUSIONS form CCVES (centre section of figure ii). CCVES, although it has been tested for only The database connectivity process (DBC), three types of DBMS, namely: INGRES, queries meta-data from a remote database POSTGRES and Oracle, could be easily (route A-1 in figure ii) to supply meta- adapted for other relational DBMSs as they knowledge (route A-2 in figure ii) to the represent their meta-data similarly - i.e. in the schema mapping process referred to as SMTS. form of system tables, with minor differences SMTS translates this meta-knowledge to an such as table and attribute names and some internal representation which is based on SQL table structures. Non relational database schema constructs. These SQL constructs are models accessible via ODBC or other tools supplied to SMVS for further processing (e.g. Data Extract for DB2, which permits (route A-3 in figure ii) which results in the movement of data from IMS/VS, DL/1, production of a graphical view of the schema VSAM, SAM to SQL/DS or DB2), could also (route A-4 in figure ii). Our reverse- be easily adapted as the meta-data required by engineering techniques 18 are applied to CCVES could be extracted from them. identify entity and relationship types to be Previous work related to meta-translation 7 has used in the graphical model. Meta-knowledge investigated the translation of dBase code to enhancements are solicited at this point by the INGRES/QUEL, demonstrating the database enhancement process (DBE) (route applicability of this technique in general, not B-1 in figure ii), which allows the definition only to the relational data model but also to of new constraints and changes to the existing others such as CODASYL and hierarchical schema. These enhancements are reflected in data models. This means CCVES is capable in the graphical view (route B-2 and B-3 in principle of being extended to cope with other figure ii) and may be used to augment the data models. database (route B-4 to B-8 in figure ii). This approach to augmentation makes use of the 16th National IT Conference, Sri Lanka, 11-13 July 1997 11
  • 12. 12 16th National IT Conference, Sri Lanka, 11-13 July 1997
  • 13. The meta-programming approach enabled us 4. Dumpala S.R. and Arora S.K., ‘Schema to implement many other features, such as the translation using the entity-relationship ability to easily customise our system for approach’, Proceedings of 2nd different data models, e.g. relational and International Conference on Entity- object-oriented, the ability to easily enhance Relationship Approach, Chen P.P.-C. or customise for different display models, e.g. (Ed.), Washington, 1981, pp. 339-360. E-R, EER and OMT, and the ability to deal with heterogeneity due to differences in local 5. Elmasri R. and Navathe S.B., databases (e.g. at the global level the user ‘Fundamentals of database systems’, 2nd views all local databases as if they come from edition, Benjamin/Cummings, 1994. the same DBMS, and is also able to view databases using a preferred DDL syntax). 6. Fiddian N.J., Gray W.A., Ramfos A. and Cooke A., ‘Database meta-translation We were able to successfully reverse-engineer technology: integration, status and a leading telecommunication database extract application’, Database Technology, Vol. consisting of over 50 entities. This enabled us 4, 1992, pp. 259-263. to test our tool on a scale greater than that of our test databases. Beside all or parts of our 7. Howells D.I., Fiddian N.J. and Gray system have been successfully used for other W.A., ‘A source-to-source meta- research work 18. translation system for relational query languages’, Proceedings of 13th International Conference on Very Large 7. ACKNOWLEDGEMENTS Data Bases, Stocker P., Kent W. and Hammersley P. (Eds.), Brighton, 1987, pp. This work was supervised by Prof. W.A. Gray 227-234. and Dr. N.J. Fiddian, and was carriedout at University of Wales, College of Cardiff. This 8. Johannesson P. and Kalman K., ‘A work was partially funded by the Association methodology for translating relational of Commonwealth. schemas into conceptual schemas’, Proceedings of 8th International Conference on Entity-Relationship 8. REFERENCES Approach, 1989, pp. 279-294. 1. Batini C., Ceri S. and Navathe S.B., 9. Markowitz V.M. and Makowsky J.A., ‘Conceptual database design: an entity- ‘Identifying extended entity-relationship relationship approach’, object structures in relational schemas’, Benjamin/Cummings, 1992. IEEE Transactions on Software 2. Chiang R.H.L., Barron T.M. and Storey Engineering, Vol. 16, No. 8, 1990, pp. V.C., ‘Reverse engineering of relational 777-790. database: extraction of an EER model from a relational database’, Data and 10. Navathe S.B. and Awong A.M., Knowledge Engineering, Vol. 12, No. 2, ‘Abstracting relational and hierarchical 1994, pp. 107-142. data with a semantic data model’, Proceedings of the 6th International 3. Davis A., ‘Converting a relational Conference on Entity-Relationship database model into an entity Approach, March S.T. (Ed.), New York, relationship model’, Proceedings of 6th 1987, pp. 305-333. International Conference on Entity- Relationship Approach, March S.T. (Ed.), 11. Premerlani W.J. and Blaha M.R., ‘An New York, 1987, pp. 271-285. approach for reverse engineering of 16th National IT Conference, Sri Lanka, 11-13 July 1997 13
  • 14. relational databases’, Communications of on Database and Expert Systems, the ACM, Vol. 37, No. 5, 1994, pp. 42- International Institute for Advanced 49,134. Studies, 1994, pp. 148-154. 12. Qutaishat M.A., ’A schema meta- 15. Ramfos A., Fiddian N.J. and Gray W.A., integration system for a heterogeneous ‘A meta-translation system for object- object-oriented database environment’, oriented to relational schema Proceedings of NordDATA'92 translations’, Proceedings of 9th British Conference, Tampere, 1992, pp. 74-92. National Conference on Databases, Wolverhampton, 1991, pp. 245-268. 13. Qutaishat M.A., Gray W.A. and Fiddian N.J., ‘A highly-customisable schema meta- 16. Richter J., ‘ODBC 2.0 further establishes visualisation system for object-oriented cross-product data sharing standard’, database schemas: overview’, BYTE, November 1994, pp. 52. Proceedings of 4th International Conference on Database and Expert 17. Rumbaugh J., Blaha M., Premerlani W., Systems Applications, Springer-Verlag, Eddy F. and Lorensen W., ‘Object- 1993, pp. 756-759. oriented modeling and design’, Prentice- Hall, 1991. 14. Qutaishat M.A., Fiddian N.J. and Gray W.A., ‘Review and potential of meta- 18. Wikramanayake G.N., ‘Assisting programmed expert systems in a Migration and Evolution of Relational heterogeneous distributed database Legacy Databases’, PhD Thesis, environment’, Proceedings of Symposium University of Wales Cardiff, 1996. 14 16th National IT Conference, Sri Lanka, 11-13 July 1997