Data Base Migration White Paper


Published on

White Paper on DB Migration from Sybase to Oracle

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Base Migration White Paper

  1. 1. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, BhubaneswarDatabase Migration – Approach & PlanningKeshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad NathGDU Surface Transport, Bhubaneswar.SummaryDatabase migration is the process of moving the schema, data and application associatedwith the current system to a different technology/platform. It is one of the most importanttasks in making any system to go live when there is a shift from one platform to anotheralong with database and associated applications. The authors here are trying to examinesome of the critical issues that arise out of the analysis and implementation of a databasemigration project. The idea is to keep the discussion independent of any platforms butassumes RDBMs architecture as target. The paper is not a guide to the implementation ofdata migration but is a discussion on the various aspects associated with it.Key WordsDatabase migration, Data Migration, Schema Migration, Extraction, LoadingIntroductionWith the growing dominance of business reengineering efforts and enterprise wideapplication integration, organizations come to a stage where they have to move theirdatabase from multiple platform to a single one or from one platform to another, driven bytechnology and requirements best suited for the particular application. System evolutionthrows up the challenges for organizations to keep pace with the rapidly growing technologyand capitalize on the advantages of features it offers. Organizations also migrate when theyrealize that their existing systems have performance and scalability limitations, which cannotcater to their ever-expanding business needs.Database migration is the process of moving the data, schema and applications associatedwith the current system to a different technology/platform. Database migration is one of themost common but a major task in any application migration or porting of an application ormoving towards ERP or EAI environment.It may be thought that when two systems must maintain similar data then they would mapfrom one to another with ease, but that is hardly ever the case. Owing to the difference insystem, design, technology and implementation many issues creep into the process ofdatabase migration, which makes the mapping of the older system to the newer one a job ofimportance. For example moving from a hierarchical database system based on de-normalization and redundant storage to a RDBMs system, based on normalization can be areally arduous job and not a straightforward transformation work. The following are a fewmigration scenarios, which vary with the type of migration as: Moving from Hierarchical database to RDBMs Moving from Network database system to RDBMs Moving from RDBMs to RDBMsThe actual implementation of database migration projects differs with the technology usedand/or the customer requirement, but the authors have taken up their experience to come upwith a methodical approach to go about database migration projects that would lessen lastminute surprises. Though this paper is not a step by step guide to database migration, thiscertainly helps in enhancing one’s understanding of it and related issues. This paper is aimedto be database or platform independent but assumes the final database to be is of RDBMstype.Database Migration – Approach & Planning 1
  2. 2. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, BhubaneswarData Migration Vs Database MigrationData migration and database migration are different, though database migrationencompasses data migration also. Data migration is simply the movement of data from onedatabase (or File System)/platform to another. This may include extraction of the data,cleansing of the data and loading the same into the target database. But database migrationessentially means the movement of data and conversion of various other structures andobjects associated with the database, viz. Business Logic - Stored Procedure, Triggers, Packages, Functions Schema – Tables, Views, Synonyms, Sequences, Indexes Physical Data – Security, Users, Roles, Privileges Database dependency of applications associated with the databaseHence data migration is a subset when database migration activities are carried out, thoughdata migration may also be taken up independently.For example, when an application is developed which requires as its source, data thatalready exists on another database, either RDBMs, Network or Hierarchical system, it isrequired to get those data for the newly developed application to operate. In this case onlythe data is moved from the required database to the database used by the new application.This is called data migration. But a database migration is when there is shifting from one typeof database systems to an entirely new type of database system or to a database systemwith entirely new features and functionality. Here a lot more things like schema, associatedapplication etc are affected apart from the data. Though here still the data needs to bemoved, there would also be changes to the database programs and database dependency ofall associated applications too.Why Database MigrationIt is interesting that when the existing systems are running with current database then why isit required to move to other database. Organizations move because they perceive bettervalue in the newer system to which they are moving. Motivations for database migrations are1. Technology changes – With the rapid change in technology, organizations wish to go for the latest offerings with more benefits and features. Another scenario is that with stride in technology, the older systems become obsolete and may be left without support from the vendors. In this case too organizations wish to move on to the latest technology which suit their current needs and future plans. Business need may also outgrow the current system and technology giving impetus to move on to a new system. For example a leading insurance company had a character based application driven by an 8 years old Informix database. The business needs changed and they wanted to go for an e- business and ERP integration, with one vendor providing all the solutions. To take the benefit of the current technology and its capabilities, the company shifted to Oracle 8i database and a corporate Intranet driven by Oracle 9iAS [5].2. Database Consolidation – It may be asked why would multiple database come up in a single organization. One of the major reasons for it is that applications are developed on an ad-hoc basis. So the most immediate solution is looked for rather than thinking from an overall perspective. Hence organizations usually end up having different applications running on different databases. Having multiple database is a logistical nightmare, in terms of maintenance and tracking of the system. Sometimes different sources feed the multiple databases and these need to be synchronized. All this involves a lot of effort and cost. Multiple systems also mean multiple licensing issues and support from various vendors. Thus it makes sense to consolidate the data into a single sink if the system and requirements permit. A leading airline company had applications running on nine different database platforms on diverse operating systems. With pressures for cost effectiveness and simplicity theyDatabase Migration – Approach & Planning 2
  3. 3. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswar decided to migrate all their databases and associated applications to only two database platforms [5].3. Lower cost of ownership – Multiple database means that many skilled DBAs and many application developers for the different platforms to develop and maintain the applications and database. In an effort to cut down this cost and leverage upon the strength of a particular database, organizations are moving unto a single system wherever feasible.4. System Optimization – During re-engineering of the organization’s business process, the organization may have to change their data storage strategy and hence making it necessary to shift, migrate or consolidate databases running in different areas of the business process. Again when business merge or organizations go for acquisition, they find themselves with multiple databases and applications running on them. In this case too, they would like to migrate the multiple database to a single database. A global company dealing mainly in defense, commercial electronics and aviation technology merged with another electronics and aviation giant. Both the companies were working with different database platforms. But after the merger in order to consolidate their business process they decided to migrate their database into a single platform[5].5. Upgrading from legacy systems – With large volumes of data stored in legacy systems, organizations are thinking of moving to the current systems and RDBMs to capitalize on the rich features and capabilities they offer. Moreover the support for legacy systems is on the decline, so moving on to contemporary systems makes business sense in terms of support, service, upgradation and using the latest technology.Components of Database MigrationDatabase migration, consists of three major components, they are, Schema Migration – This consists of mapping and migrating the source schema with the target schema. For this the schema needs to be extracted from the source system and the equivalent needs to be replicated in the target system Data Migration – This is the part where the data is extracted from the source database. Then it is checked for consistency and accuracy, it is cleansed if necessary. Finally it is loaded into the target system. Application Migration – This necessarily consists of changing the database dependent areas (function calls, data accessing methods etc) of the application so that the Input/Output behavior of the converted application with the target database is exactly identical with that of the original application with the source database.Network database to RDBMs MigrationIn the case where migration of a Network database to RDBMs database [4] system is done,the changes occur at three major levels: Migration of database design and structure - Each record in the network database system needs to be converted to a table in the RDBMs and the set relationship has to be converted to foreign key definitions in the respective RDBMs. Migration of data Migration of associated programs and JCLsThe migration methods from Network database to RDBMs database system may varyaccording to the extent to which the data and application process flow are modified.Hierarchical database to RDBMs Migration [2]In the case of migration between a Hierarchical database to RDBMs database system,there are three major software solution. They are:Database Migration – Approach & Planning 3
  4. 4. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswar Language Interfaces – In this method either a SQL interface is provided to the hierarchical database or a procedural, record by record interface is provided to the relational database. When it is possible to provide a procedural, record by record interface is provided to the relational database, only the data needs to be moved but the existing applications need not change. For the application the database will be transparent in this case. Source Code conversion – In this solution the data is moved from the hierarchical database to the RDBMs. Along with it the source code of all the associated programs are converted to work with the RDBMs systems. Data Propagation – When RDBMs system and a hierarchical system are concurrently run, the data propagators are used to synchronize both the database systems.The database migration in this case is done at three level, they are: Mapping and migration of the keys from the hierarchical database system to the RDBMs system. Data migration Migration of Hierarchical database calls to SQL calls.For example for their database migration, Swiss Bank and IBM have designed anddeveloped the IBM Data Propagator MVS/ESA, which supports interactive and batch datapropagation. This software migrates data from the hierarchical IMS to the relational DB2,without affecting existing applications. It supports forward and reverse data propagation,which lets heterogeneous databases coexist [2]RDBMs to RDBMs MigrationIn case of a RDBMs migration, applications evolve over time and in many cases databaseschema will change. There are three levels involved in a RDBMs migration: Schema Migration Data Migration Query TransformationIt is important to develop methods and tools supporting the encoding, elicitation, enrichmentand editing of the schema mapping. Based on the formulation of the schema mapping, oneneeds to develop theories and tools for migrating data from one database to another andconverting the SQL for a schema to another schema.Tasks involved in a database migration projectThe major tasks of a database migration can be classified as:1. Source to Target Mapping – The first need is to map the various parameters of the source database to the target database. Mapping includes the following:  Data Structure Mapping  Data Type Mapping  Internal Storage Mapping  Physical Storage Mapping  Column Mapping  Semantic Mapping  Index Mapping A strategy has to be finalized for mapping when the source attributes do not map exactly with the target attributes. For example, if there is a data type in the source database and there are no exact counterparts in the target database, then the nature of the data present would have to be seen and then it has to be decided which data type in the target environment is close enough to hold the data.2. Database Constraints Study – A study of the source database constraints must be undertaken to find out the relationship between different tables and associatedDatabase Migration – Approach & Planning 4
  5. 5. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswar constraints. The study would help to decide where to implement the database constraints, whether to implement them at the database level or at the application level. This would also help in the data loading process, where it has to be seen that none of the constraints are violated.3. Database Sizing – Sizing involves the process of estimating the parameters of target database size, taking into consideration various parameters of the source database. The sizing is done for the database, tables and indexes. This would help in allocating enough space and setting the proper parameters in the target database.4. Data Cleansing – Often there is the need to clean the data when moving from an existing system to a newer one. There may be the case where the data on the older system may prove to be inconsistent when being moved to the newer system. For example, if there is an employee id in the older system which is present in different tables or files in different lengths (say 10 and 15). When this employee id has to be moved to the target database the consistency of the field length and then which data to move has to be decided, before it is uploaded to the new system. There may also be the case in the older system where the data is inconsistent, i.e. the system has bad data. In this case too while moving to the newer system, the data needs to be fixed and the authenticity of the data decided before it is moved. Inconsistent data may be in the form of data being stored in different representation in different table/files/records. For example the data ”Satyam Computer Services Limited” may be stored as “Satyam”, “scsl” or ”Satyam Computer Services Limited” in the source system at various places. But when moving this data into the target database the consistency of the data has to be taken care of, a single form of data among the above forms has to be finalized and moved to the target database. Inconsistent, incorrect or “Bad data” would be problematic for the business process too. For example  Inaccurate data caused an insurance company to raise its risk exposure too high and suffer very expensive losses on many of the policies it wrote.  A manufacturer sold off what it thought was excess stock because of invalid data. The company was actually short of stock, leading to thousands of unfilled orders, unhappy customers, and lost revenue. Data cleaning also includes the case where there is the need to reformat the existing data to fit into the target environment. For example Sybase stores dates in date and time including milliseconds format. While doing a migration to Oracle care has to be taken of this, as Oracle stores date in date and time till seconds format. So the Sybase data needs to be cleansed when moving to the Oracle platform.5. Data Feed –Care has to be taken of the various data sources which are going to feed the database, It has to be considered if the data is coming from file systems or legacy systems or some other data source. The data formats, the volume of data etc has also to be taken into account.6. Conversion of database programs – All the database programs like stored procedures, triggers, packages, functions etc need to be converted from the existing database programs to the target database programs, to support the required business process.7. Conversion of the Application – The application with database specific dependency now needs to be converted/improved/enhanced keeping in view the desired capabilities and features of the target database.8. Data Extraction and Loading – When the database is ready with proper sizing and an idea about the nature of data to be loaded is there, the next stage is for the extraction of data from the source database and loading of the target database. Data is first extracted from the source database. This can be done by utilities provided with the database (ex. BCP of Sybase with the “OUT” parameter) or by spooling the data from within the database to flat files. Data loading can be done primarily in two ways. Either by using data loading utilities/tools which come with all the major databases (SQL* Loader from Oracle, BCP from Sybase with the “IN” parameter etc) or by writing database programs for the target database (ex - stored procedures) to read the extracted data and load themDatabase Migration – Approach & Planning 5
  6. 6. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswar into the target database. A parallel run of both the system is done and the data on the new system is synchronized with the existing system.9. Testing – A thorough testing methodology mainly based on the before image and the after image of the system needs to be done. The testing should be repetitive and as exhaustive as possible with the critical conditions taken into considerations.10. Go Live – This is the final step where the database with it’s data and associated application are live and in production.Factors affecting the Database MigrationThe general approach and planning for a database migration is by and large same fordifferent sources and different targets, but they depend to some extent on the followingcases - Migrating from legacy system to RDBMs Coupling of the applications with the database – It is simpler for loosely coupled applications to be migrated than closely coupled ones. Are associated applications off the shelf or custom built Does change of database done along with a change of underlying operating system.Phases of Database MigrationThe database migration projects can be divided into distinct phases [1]. Broadly the phasescan be defined as: Strategy Definition Phase Analysis Phase Design Phase Conversion/Migration Phase Testing Phase Implementation PhaseStrategy Definition PhaseDuring this stage there is the need to define and finalize the exact goal of the databasemigration project. Objectives and deliverables are clearly defined. At this stage there is amacro level view of the entire system and it is decide what all portion of the system will beaffected and touched during the conversion. There is a need to arrive at a definitive plan ofwhat all needs to be changed and which systems need to be converted. When databasemigration is not a stand alone job and is part of a bigger project like system integration or re-engineering work, the strategy phase of the database migration project should be doneDatabase Migration – Approach & Planning 6
  7. 7. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswarconcurrently with the strategy phase of the associated project. This would give a better and amore firm idea of the working of the overhauled or new system and it’s final working. But theimportance of database migration is generally overlooked when it is part of the bigger project.This is because in an overall project database migration sounds like an innocuous part;though this is hardly ever the case so.The milestone and deliverable of this phase is a strategy document where the goals of theoverall migration effort and the reasons for the conclusion are presented.Analysis PhaseThe analysis stage takes input from the strategy phase. It expands on the strategydocument. By this time the goals of the project are known, so now it is determined whatneeds to be done in which area. Here the “quality” of the data in the older system is seen andit is determined if they will go on to the newer system. During the analysis phase the mostimportant aspect is to examine all the database objects of the source database and theirequivalence to the target database. For the database, schema extraction and analysis toolsmay be used. The schema extraction tool queries the meta-data of the database. The logicalstructure of the database is found out in this way. Then the inspection and analysis of thedatabase programs and the data gives the relationship (cardinality), aggregation, computeddata etc. about the data and data structures in the schema. There may be someinconsistencies with respect to data types and their internal representation, this has to beresolved with respect to the target database.The target system may be replacing or adding to the existing capabilities of the currentsystem in most cases. Moreover when the newer system is not an exact mapping of theolder system the gaps between the two systems has to be noted. If the new system doesrequire new data in some areas, it has to be decided how those data can be acquired orgenerated based on the older data.During the analysis phase the following information has to be gathered about the currentsystem Hardware and Operating system specifications Table structure, constraints, table size etc Indexes and Index size Stored Procedures, Packages, function and triggers with their complexity Data types used in the columns Database maintenance schedules and associated scripts Associated applications and their processing profile (online, batch etc) Any ERP/CRM applications involved (SAP, Siebel etc) Any custom code and code profile (language, development and testing) Primary development languages and tools (C, Java, VB, Powerbuilder, etc.) for the developed applications. Any Middleware used (like tuxedo or any application server)After the above information is gathered now the stage is ready to map these systems into thetarget database.Design PhaseThis phase is where the findings in the analysis phase are validated. Preparations of themapping document based on the inputs which are got from the analysis phase is done. Thisphase should ideally involve a business analyst who has intimate knowledge of the systemand what it is expected to do. What needs to be done on the ground is finalized based ontheir inputs. The database schema for the target system is designed here taking inputs fromthe analysis phase. The changes for the database programs and the changes for thedatabase dependency of the associated applications are also finalized here. Based on theDatabase Migration – Approach & Planning 7
  8. 8. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswarearlier database and other parameters this is also the place where the target database sizingis done before actually creating the database physically. Appropriate space is specified fordifferent database objects. The right sizing at the beginning saves a lot of trouble andhassles during the final implementation and Go Live stage. Based on the information andknowledge of the system that has been gathered till this phase following activities are done: Choose a migration method Build the migration planNow having chosen the migration method and the actual specifics of the migration plan themigration/conversion in the required areas is started.Conversion/Migration PhaseThis is where the design document is taken as input and actual conversion/migration of thedatabase is started. The actual conversion can be represented as shown in figure – 2.The implementation can be broadly divided into three areas. Database schema creation - Here the target database schema is created according to the inputs from the design phase. Target database and the required database objects as tables, views, synonyms, indexes, sequences, users, roles etc are created according to the required schema and sizing of the database. Data extraction and loading - The data from the source database is first extracted to prepare for loading into the target database. Database programs may be written to load up the data or data loader utilities, which come with the databases (ex – SQL loader in Oracle and BCP in Sybase) may be used. The actual data cleansing and manipulation may be done here before loading them into the target database. Moving of the associated applications and database programs to the new system - Databases come with associated database objects and programs (stored procedures, triggers, functions, packages etc) and applications. So in the migration work these applications and database objects have to be changed to fit the target database. Wherever third party interfaces like ODBC/JDBC are used to connect the applications with the database, the migration is fairly straightforward and simple. But where databaseDatabase Migration – Approach & Planning 8
  9. 9. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswar specific or native code is used for the database connectivity of the application, more effort is required for the same.Unit testing is also done in this phase. This phase also includes the enhancement of thetesting strategy, which needs to be implemented later. Taking inputs from the businessanalysts, end-users and seeing the program itself, a comprehensive testing strategy andmethodology should be arrived at which would validate the migration work.System and Integration Testing PhaseThis is a very crucial aspect of the entire project. This phase would include the actualprogrammers, business analyst and the end–users testing in tandem. The testing strategyprepared during the implementation stage comes very handy at this stage. At this stage theaim is to capture any logical error that might have crept during the migration of the system.The key points to keep in mind while testing are: Has data loading moved all relevant data to the target database Is proper data residing in the intended tables and fields Are the associated applications and database programs doing what they were intended to do and manipulating the data properly.Here the help of the business analysts and the end – users are taken. Though the end usersmay not help at previous stage, but once they see the system physically they would surelyhelp in validating if the converted system behaves as the previous system. The more theinvolvement of the end user in this phase the better the chances to ward off possible errorsand aberrations in the system at this stage. Users are more suited to test the databaseprograms and associated applications but they would find it very difficult to test theauthenticity of the loaded data in the target database. For testing for proper data loadingautomated tools or utilities may be used. These utilities compare the data in the source andtarget database and come up with discrepancy if any in them. Then it is for the developersand the business analysts to go through the discrepancy and decide if this falls in line withthe data cleansing scope or if it is an error in data loading. The variation and inconsistenciesof the data on the target database from the source database may lead problems in thebusiness process. For example It may not come as surprise if at this stage, based on thedata discrepancy observed during testing it is found that some activities have been missed,which needed to be taken care of earlier. Then a small iteration of the previous stages isdone to rectify this.Implementation PhaseThe implementation is the stage where the target system goes live. It primarily depends onthe type of system and differs on case to case basis. The general approach is to have aparallel run of the older and the new system. When the reliability of the new system isassured, working of the older system is stopped. Before this there must be a comprehensivebackup of the data and a recovery strategy formalized to face any unforeseen scenariowhere reverting back to the existing system is needed.Migration Tools/UtilitiesWith many similar tasks to be performed during the course of database migration, there aremany tools/utilities available to aid and assist in the process. Tools are generally used tocapture the meta-data from the source database and store it in a repository. From here theygenerate the schema for the target database. They also help to capture all the databaseprograms like stored procedures, packages, triggers, functions etc from the source databasewhich are also stored in a central repository and convert them into target database programsDatabase Migration – Approach & Planning 9
  10. 10. September 12 and 13, 2002 Satyam Technology Center GDU, Surface Transport, Bhubaneswarwith minimal human intervention. The tools/utilities also help in the extraction and loading ofdata. All this is done with scope for customization according to the requirements at everystage of the migration. [3]All major database vendors have their own database migration tools (like OMWB – OracleMigration Workbench, Ispirer Chyfo, SQLPorter, CRYSWARE for migration from mainframesto RDBMs). Vendors also have their own constancy services to help in the process ofmigration. When the tools assist and support the above mentioned phases they would provehelpful based on the end requirement.Case StudyA case study of a database migration project is provided in Annexure –1.ConclusionDatabase migration is seldom achieved in a single effort, as there are a host of uniquefactors associated with database migration itself and the uniqueness of the customer’srequirement. Hence it is not exactly a cut and dry affair but is different every time. But theauthors believe this approach would minimize the iteration and rework involved in a databasemigration project. The secret to a hassle free database migration project is to invest moretime and energy in the analysis and design stage and monitoring of each stage very carefullyfrom the first day.References[1] Data Migration Methodology, Web Qualify, Satyam Computers Services Limited.[2] Hierarchical to Relational Database Migration – Andreas Meier, Rolf Dippolod, JackyMercerat, Alex Muriset, Jean-Claude Untersinger, Robert Eckerlin and Flavio Ferrara –IEEESoftware 1994, vol-IV, PP-21 to 27.[3][4] Migrating from CA-IDMS® to a Relational Database, Prince Software Inc.[5] The Great Migration, Oracle Magazine, May/June-2002 Volume XVI, Issue 3, page–57-65About the Authors:Keshav Tripathy – Has been with Satyam from February 2001. Currently working with GDUSurface Transport as Project Manager. He was responsible for migrating legacy applicationsto ERP as well as database migrations, where he has handled migration issues. His area ofinterest are database design, data modeling and semantic query optimization. He can bereached at Keshav_Tripathy@satyam.comBiraja Prasad Nath - Has been with Satyam since June 1997. Currently working with GDUSurface Transport as a Project Leader. He has worked extensively in Oracle, Sybase andSQL Server database. He has handled number of database migration projects. His area ofinterest is Database design, Data Modeling and Database tuning. He can be reached atBiraja_Nath@satyam.comPragjnyajeet Mohanty– Has been working with Satyam from September 2000. Currentlyworking with GDU Surface transport as team member. He has worked on various modules ofdatabase migration projects. His interest lies in both curricular and extra curricular activities.He can be reached at Prag_Mohanty@satyam.comDatabase Migration – Approach & Planning 10