netezza-pdf

3,439 views

Published on

netezza document for beginners

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,439
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
133
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

netezza-pdf

  1. 1. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Neeraj Singh (sneeraj@in.ibm.com) Advisory Software Engineer IBM   14 August 2009 Yongli An (yongli@ca.ibm.com) MDM Performance Manager IBM The maintenance services for IBM InfoSphere™ Master Data Management Server solution address the needs of clients in the first phase of implementing initial load solutions. Using MDM, clients need to perform initial and delta loads, typically as a batch. This article focuses on the maintenance transaction approach to perform initial loads, including an introduction, installation, and setup. It also covers performance tuning tips and best practices. You can leverage recommendations in this article as guidance in your own MDM Server initial load solutions using maintenance services. View more content in this series Introduction IBM InfoSphere Master Data Management Server (MDM Server) is an enterprise application that helps companies gain control of business information by enabling them to manage and maintain a complete and accurate view of their master data. MDM Server provides a unified operational view of their customers, accounts, and products, and it provides an environment that processes updates to and from multiple channels. It aligns these front office systems with multiple back office systems in real time, providing a single source of truth for master data. The maintenance services for IBM InfoSphere Master Data Management (MDM) Server solution is built to address the needs of clients in the first phase of implementing initial load solutions. At this stage, clients deploy InfoSphere MDM Server for master data management, when data is loaded into the MDM Server repository but most data changes are still coming from existing legacy systems. With MDM Server, the client performs initial and delta loads, typically in a batch. Initial load is the original movement of data from source systems into the MDM Server repository when © Copyright IBM Corporation 2009 Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Trademarks Page 1 of 34
  2. 2. developerWorks® ibm.com/developerWorks/ the repository is empty. Delta loads are regular (such as daily) data updates from source systems into InfoSphere MDM Server. There are two different approaches to loading data into InfoSphere MDM Server in batch. The maintenance service batch approach loads data into InfoSphere MDM Server using the maintenance services invoked by the Batch Processor. Alternatively, data can be loaded directly into the database using DataStage jobs. This article shares an IBM team's experience performing case studies focusing on the Maintenance Transaction approach using InfoSphere MDM Server version 8.0.1. The article starts with an introduction to MDM Server Maintenance Transactions. Then it goes on to cover the basic installation and setup steps of the MDM Server environment, including DB2® database server, WebSphere® Application Server, InfoSphere MDM Server, MDM Server Maintenance Transactions, and batch processor. The article covers a high-level summary of key performance results based on internal case studies. It concludes with a list of performance tuning tips and best practices to get optimal performance while doing initial data load. Using this article, you can leverage the IBM team's experience, and you can use recommendations as guidance in your own InfoSphere MDM Server initial load solutions. Introducing the MDM Server service batch approach The MDM Server service batch approach loads data into MDM Server using the maintenance transactions batch processor invokes or using any other batch framework. Because MDM Server services process the data during load, this approach provides the best level of business data validation. You can use the same set of maintenance transactions for both initial and delta loads. To create the setup that uses this option, you need to install InfoSphere MDM Server capable of running maintenance transactions. You also need to prepare the input data in a format that the Batch Processor can consume. What are maintenance transactions? InfoSphere MDM Server creates a unique internal identifier for each record or business entity that serves as its internal key. The regular InfoSphere MDM Server services expect the internal key to be provided as part of the update service request, to ensure that services can identify the correct business entity in the database. However, when data flows into InfoSphere MDM Server directly from external applications such as legacy systems, the internal key is not known, and often the nature of the data change is also not known. Maintenance transactions address this problem. These transactions do not require the internal key as part of the input. They also do not require the external system to specify whether this entity needs to be added or updated in InfoSphere MDM Server. Instead of the internal key, maintenance transactions expect the business key as part of the input, which is the unique identifier of the business entity in external applications. Maintenance transactions use the business key provided in the load operation to locate the correct instance of the business entity in the database. If an existing entity is found, it is updated using the appropriate transaction, such as updateParty. If no Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 2 of 34
  3. 3. ibm.com/developerWorks/ developerWorks® existing entity is found, a new entity is created in InfoSphere MDM Server using the appropriate transaction, such as addParty. There are many types of maintenance transactions, including maintainParty, maintainPersonName, and maintainContractPlus. For a complete list of the transactions and more details about them, refer to the MDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf document, available as part of the EntryLevelMDM patch. Maintenance transactions are not part of default InfoSphere MDM Server 8.0.1 distribution and installation. You need to obtain and install EntryLevelMDM patch to use these transactions. Note: Maintenance transactions are part of default InfoSphere MDM Server 8.5 distribution. They are provided with source code as part of the MDM Server Samples distribution archive. You need to install them on top of an existing InfoSphere MDM Server 8.5 instance. See Resources for a link to instructions. It's recommended that you get assets from the FTP site mentioned in the Get the Installer section in this article to ensure you have the latest version. Batch transaction processing You can use maintenance transactions to load data using MDM Server Batch, or they can be invoked as any other service exposed by MDM Server using the RMI or JMS messaging mechanisms. This article focuses on the invocation batch method. InfoSphere MDM Server provides two ways to perform batch transaction processing. You can use either the J2SE Batch processor framework or the WebSphere Application Server eXtended Deployment batch framework. This article focuses on the first option: the J2SE Batch Processor framework. The J2SE Batch processor framework is a J2SE client application, and it is part of a default InfoSphere MDM Server installation. The batch processor is a multi-threaded application that can process large volumes of batch data. It can process multiple records from the same batch input simultaneously, increasing the throughput. Additionally, you can run multiple instances of the batch processor simultaneously, each one processing a separate batch input and pointing to the same server or to different servers. Each batch record in the batch input flows through the batch processor in the following sequence: 1. The reader consumer reads the record from the batch input. The submitter consumer sends it to the request/response framework for parsing and processing. 2. The parser transforms the input request into one or more business objects. 3. After passing through business proxy, business processing and persistence logic are applied to the business objects. 4. The application responses are sent to the constructor in order to construct the desired batch output response. 5. The constructed response is returned to the batch processor. 6. The writer records the transaction outcome in the writer log, if necessary. For example, FailedWriter logs any failed messages. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 3 of 34
  4. 4. developerWorks® ibm.com/developerWorks/ The batch processor is shipped with pre-built readers and writers that can be used as is. The default reader expects the batch input is an XML data format where each line contains one XML request. The default writer writes the response in the XML format. You can also use the InfoSphere MDM Server batch processor to process batch files containing messages in SIF format. If your input data is not in the format specified above, you need to convert them to the required format, or use a customized reader and parser. It is possible to customize many of the components of the Batch Processor, but customization is not within the scope of this article. Understanding software and hardware requirements The following is a typical system topology for InfoSphere MDM Server deployment using QualityStage from Information Server for Standardization and Matching: • Application Server and InfoSphere MDM Server are installed on one physical box or LPAR with the correct CPU capacity (Server1). The number of CPUs depends on the overall throughput requirements. • The database server is installed on another physical box or LPAR (Server2) with wellequipped IO capacity. • IIS Server should be installed either on the database server or on a third physical box or LPAR (Server3) with adequate IO bandwidth. • IIS Client is used to configure QS jobs, and it is installed on a Windows® computer. To efficiently maximize the performance for the given configuration, follow the following general guidelines: • The ratio of the number of CPUs on InfoSphere MDM Server and DB server can range from 2:1 to 3:1. For example, if you have a database server with 4 CPUs, the recommended number of CPUs on the MDM Server box is at least 8 CPUs in order to well-utilize the CPU capacity on the database server. • You should have 5 to 10 physical disk spindles available for each CPU on the database server. • The ratio of the number of CPUs on InfoSphere MDM Server and IIS server can range from 2:1 to 1:1. For example, if you have MDM Server with 8 CPUs, the recommended number of CPUs on the IIS server box is between 4 and 8. Note: You only need IIS server if you plan to use QualityStage for standardization and matching (such as suspect processing). InfoSphere MDM Server default configuration does not use QualityStage. Exploring the example environment This section briefly describes the example environment, including hardware and software information, in each layer in the stack. It also describes the system topology used in the tests. Software and hardware stack • Server 1 (AppServer and InfoSphere MDM Server) Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 4 of 34
  5. 5. ibm.com/developerWorks/ developerWorks® • Hardware • Machine type: IBM 9116-561, PowerPC® POWER5™ • CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit • Memory/IO: 32 GB RAM, 6 internal disks • Software • OS : AIX® Version 5300-06 (64 bit) • WebSphere® Application Server ND 6.1.0.11 (32 bit) • InfoSphere MDM Server 8.0.1 + EntryLevelMDM patch • Server 2 (DB2® database Server) • Hardware • Machine type: IBM 9116-561, PowerPC POWER5 • CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit • Memory/IO : 32 GB RAM, 6 internal disks + 40 external disks • Software • OS : AIX Version 5300-06 (64 bit) • DB2® database server v9.5 (64 bit) • Server 3 (Information Server) • Hardware • Machine type: IBM 9116-561, PowerPC POWER5 • CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit • Memory/IO : 32 GB RAM, 6 internal disks • Software • OS : AIX Version 5300-06 (64 bit) • IIS v8.0.1 • Server 4 (IIS Client - To configure QualityStage jobs, not needed while running the test) • Hardware • 32 bit x86 machine • Software • OS : Windows 2003 Server • IIS client version 8.0.1 for Windows System topology For InfoSphere MDM Server to use QualityStage jobs for standardization and matching, you need Server3 and Server4, as shown in Figure 1. For default standardization and matching algorithms from InfoSphere MDM Server, Server1 and Server2 are sufficient. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 5 of 34
  6. 6. developerWorks® ibm.com/developerWorks/ Figure 1. System topology Installing the components The purpose of this section is to show the high-level steps required to get the needed software installed in the test environment. The steps focus on the Maintenance services-related steps, while briefly mentioning the prerequisite software installation, including WebSphere® Application Server, DB2 database server, InfoSphere MDM Server, and InfoSphere Information Server. Installation prerequisites The prerequisite installations include WebSphere Application Server, DB2 database server, and InfoSphere Information Server. For installation instructions, see each product's Information Center in Resources. 1. On Server1, install IBM WebSphere Application Server Network Deployment, Version 6.1, and upgrade it with Fixpack 11. 2. On Server2, install DB2 Database Server, Version 9.5. 3. On Server3, install IIS Server, Version 8.0.1. 4. On Server4 (Windows machine), install IIS client. InfoSphere MDM Server Installation For InfoSphere MDM Server installation, see Resources for a link to the information center. You can install it on a standalone WebSphere Application Server or on a WebSphere Application Server cluster. Installation of Entry Level MDM Server patch for maintenance services Follow the steps in this section to apply the Entry Level MDM (ELMDM) Server patch, which enables you to use maintenance transactions. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 6 of 34
  7. 7. ibm.com/developerWorks/ developerWorks® These instructions assume that you have already installed InfoSphere MDM Server and have applied all the required fixpacks. These instructions are based on software stack mentioned in the Test Environment section. Step 1. Get the installer. Maintenance transactions are not part of the default installation of MDM Server, and they need to be installed separately. If you have a service agreement with IBM, you can get the installer for maintenance transactions by logging into the Secure File Transfer site and finding https:// testcase.boulder.ibm.com/www/prot/MDM_RDP/?T. At the time of writing, the latest installable package is https://testcase.boulder.ibm.com/www/prot/MDM_RDP/MDMServer801_RDP801/ ELMDM-20090407.tar.gz. Contact your IBM service representative if you need help getting this package. For more instructions, see the chapter titled Installing Rapid Deployment Package for MDM Server Maintenance transactions and MDM Customizations in the document MDMRapidDeploymentPackage_InstallGuide.pdf. You can find this document under the directory Docs when you uncompress the installer. Step 2. Make required backups before installing. The installer makes changes to the InfoSphere MDM Server Database. As a precaution, you might want to make a backup of this database before running the installer. The installer creates backup copies of files that it changes. These files are named *.beforeELMDM. However, they get overwritten during subsequent installer runs. So before you invoke the installer again for any reason, ensure you have moved the previous set of files to a safe place. The files modified by the installer are: • MDM Server home directory installable .ear file. For example, /usr/IBM/MDM_801/ installableApps/MDM.ear • A set of files in the <MDM_Instance>.ear directory under WebSphere Application Server. For example, /opt/IBM/WebSphere/AppServer/profiles/AppSrv1/installedApps/myHostCell01/ MDM_801.ear/ Step 3. Prepare the installer. Complete the following steps to prepare the installer. a. Create a new base directory named setup. b. Extract the installer (.tar.gz file) in this directory. It creates several directories, including one named install. c. Go to directory setup/install/DB2 database server. d. Give execute permissions for all the scripts using the command chmod 755 *.sh e. Connect to the InfoSphere MDM Server database and execute the SQL below. The schema name is assumed to be mySchema. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 7 of 34
  8. 8. developerWorks® ibm.com/developerWorks/ Listing 1. SQL to execute db2 "insert into mySchema.DataAssociation values (25083715210700005,'a_name',current_timestamp,'a_description',null)" Step 4. Customize a clustered environment. This step is not required if your MDM Server is a standalone server. If you are installing ELMDM on a Clustered MDM Server installation (MDM Server running on a cluster of WebSphere Application Servers), make the following modifications in the scripts. a. In setVariables.sh, add the line in Listing 2 at the beginning of the script. NAME_OF_SERVER refers to the name of the WebSphere Application Server instance that is a member of the cluster. Listing 2. Added line #add the line below export SRV_NAME=NAME_OF_SERVER b. In the scripts install_DisableHVL.sh, install_EnableHVL.sh, and install_ELPCustom.sh, make the changes shown in Listing 3. Listing 3. Changes to script files #comment out the line below and replace with the new line as shown below #$CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD #add the line below $CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD c. In the install_ELPTx.sh script, make the changes in Listing 4. Listing 4. The install_EPLTx.sh script #comment out the line below and replace with the new line as shown below #$LOC/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD #add the line below $LOC/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD Step 5. Optionally modify the installer to help in debugging. Complete the following steps to modify the installer to debug. a. At the beginning of each script, add set -x b. Add the verbose option to db2 calls by replacing all occurrences of db2 -tf with db2 -tvf in the scripts below: • runsql.sh • install_ELPCustom.sh • install_EnableHVL.sh • install_DisableHVL.sh Step 6. Set your environment variables Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 8 of 34
  9. 9. ibm.com/developerWorks/ developerWorks® Modify the setVariables.sh script according to your environment. The values given in Listing 5 are examples. Read the comments and instructions embedded within the example. Listing 5. Extract from the setVariables.sh script export WAS_HOME=/opt/IBM/WebSphere/AppServer export CELL_NAME=myhostCell01 #set the profile name used by WAS running MDM Server. such as AppSrv01 and Custom01 export NODE_NAME=Custom01 export APP_NAME=MDM_801 #The Name of the WebSphere Application Server running MDM Server, #You will have this only if you followed Step 4 above export SRV_NAME=Cluster_member1 export INSTALL_HOME=/usr/IBM/MDM_801 # IIS Server Version: Could be 801 or 81 export IIS_SRV_VERSION=801 export export export export export export DB_NAME=MDMDB DB_USER=myDBuser DB_PASSWORD=myDBpassword TABLE_SPACE=TABLESPACE1 INDEX_SPACE=INDEXSPACE1 LONG_SPACE=LONGSPACE1 export TRIG=COMPOUND export DEL_TRIG=TRUE export APPLICATION_NAME='WebSphere Customer Center' export APPLICATION_VERSION=8.0.1.0 export DEPLOY_NAME=MDM_801 #You need to set this only if you are integrating QualityStage with MDM Server. #Please note the back slashes. The number 2809 here refers to the #bootstrap port of WebSphere Application Server instance running IIS server. export ISP_URL='iiop://myIISserver.mylab.ibm.com:2809' Step 7. Execute the scripts. a. Execute install_ELPTx.sh. b. If you are integrating InfoSphere MDM Server with QualityStage, run the install_ELPCustom.sh script as well. Step 8. Check for errors. Go through all the log files to ensure there are no errors. Step 9. Repeat steps for a clustered environment. If you are installing in a clustered environment, complete the steps below for each cluster member. a. Reconfigure setVariables.sh to point to another cluster member. b. Run the additionalClusterInstall.sh script. c. If you are integrating InfoSphere MDM Server with QualityStage, run the install_ELPCustom.sh script. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 9 of 34
  10. 10. developerWorks® ibm.com/developerWorks/ Note: As part of the install_ELPCustom.sh script, there are changes made to InfoSphere MDM Server database. Some of these changes cannot be executed more than once (such as a DB insert). Either ignore these errors during repeated execution of this script, or alter the script so that it does not attempt to repeat the database operations. Step 10. Configure the SIF parser. Complete this step only if you want to use a SIF parser. Otherwise, skip to Step 11. The example uses the default XML parser. To configure the batch processor to use the SIF parser, modify the following: a. In the DWLCommon_extention.property file, which is in properties.jar on server runtime environment, set sif_compatibility_mode = on. b. In batch extension property file, set ParserAndExecConfiguration.Parser = SIF. For more details, see the section SIF Parser in MDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf. Step 11. Restart the InfoSphere MDM Server. Restart the InfoSphere MDM Server, including all the servers in a cluster. Integration of InfoSphere MDM Server with QualityStage If you want to use default standardization and matching algorithms from InfoSphere MDM Server, these steps are not needed, and you can continue to Optimizing performance with key configuration parameters. However, if you want InfoSphere MDM Server to use QualityStage for standardization and matching, this section describes how to configure them. These instructions assume the following: • InfoSphere MDM Server is installed and all the required fixpacks are applied. • EntryLevelMDM is installed. • The IIS server and IIS client are installed. The version of the IIS client must be the same as that of the IIS server. • The software stack is similar to that described in the Software and hardware stack section of the example environment. See Resources to access the documentation for InfoSphere MDM Server and QS integration (MDM Server Developers Guide, chapter titled Integrating IBM Information Server QualityStage with IBM InfoSphere Master Data Management Server). The instructions in this article complement those mentioned in the developer's guide. However, there are a few configuration changes mentioned in this article that are helpful during the installation. Step 1. Change security settings. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 10 of 34
  11. 11. ibm.com/developerWorks/ developerWorks® If global security is enabled on the WebSphere Application Server running IIS, the transaction protocol security on that server must be disabled. To disable protocol security on a server, complete the following steps in the administrative console: a. In the administrative console, click Servers > Application Servers > server_name. The properties of the application server are displayed in the content pane. b. Under Container Settings, expand Container Services and click Transaction Service to display the properties page for the transaction service. c. Under Additional Properties, click Custom Properties. d. On the Custom Properties page, click New. e. Type DISABLE_PROTOCOL_SECURITY in the Name field, and type TRUE in the Value field. f. Click Apply or OK. g. Click Save to save your changes to the master configuration. h. Restart the server. Optionally, if WebSphere Application Server application security is turned on for InfoSphere MDM Server, the LTPA keys need to be shared between the MDM WebSphere Application Server cell and the IIS WebSphere Application Server cell. For detailed instructions, refer to the WebSphere Application Server Information Center (see Resources). Step 2. Get the installer. The installable components are part of the same bundle that you used while installing maintenance services. You will find them in the QualityStage folder. Step 3. Create the IIS project. Use the IIS Administrator Client to connect to the IIS server. Create a new project called ELMDMQS. Step 4. Import the IIS project. 1. Log into the ELMDMQS project through the DataStage and QualityStage Designer. 2. Click Import > Datastage Components. 3. Browse to the ELMDMQS.dsx file under the EntryLevelMDMQualityStage folder you extracted above. 4. Import the file. Step 5. Provision imported rule sets. You need to provision imported rule sets to the designer client before a job that uses them can be compiled. Complete the following steps to provision imported rule sets. a. In the Designer client, find the rule set within the repository tree ELMDMQS > ELMDMRT > Standardization Rules > MDMQS. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 11 of 34
  12. 12. developerWorks® ibm.com/developerWorks/ b. Select the rule set by right-clicking and selecting Provision All from the menu, as shown in Figure 2. Figure 2. Provisioning rule sets c. Repeat the steps for all the rulesets listed below. • MDMQSStandardization RulesMDMCanadaCAADDRMDMCAADDR • MDMQSStandardization RulesMDMCanadaCAAREAMDMCAAREA • MDMQSStandardization RulesMDMUSAUSADDRMDMUSADDR • MDMQSStandardization RulesMDMUSAUSAREAMDMUSAREA • MDMQSStandardization RulesMNADKEYSMNADKEYS • MDMQSStandardization RulesMNNAMEMNNAME • MDMQSStandardization RulesMNNMKEYS • MDMQSStandardization RulesMNPHONEMNPHONE • MDMQSStandardization RulesMNSPOSTMNSPOST Step 6. Prepare test data and configure parameters a. Copy the provided test data (*.csv files and *.txt) into a directory on your IIS server (not the IIS client) called /data01/ELMDMQS. b. Open the parameter set ELMDMQS_Data_Directory under ELMDMQSELMDMRTParameter Sets (in the Repository view of the designer). c. Double-click on the Parameter set. d. Go to the Values tab and set the value of the parameter DATADIR to the directory path into which you just copied the test data (/data01/ELMDMQS/ in this example), as shown in Figure 3. Note the slash (/) at the end of the parameter value. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 12 of 34
  13. 13. ibm.com/developerWorks/ developerWorks® Figure 3. Parameter set e. Under the ELMDMQSELMDMRTShared Containers folder, double-click to open the shared container MDMQSPartySuspectReferenceMatchOrganization. f. Set the file paths of data set stages Data_Frequency and Reference_Frequency to the same path that you provided for ELMDMQS_Data_Directory.DATADIR to in the previous step, as shown in Figure 4. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 13 of 34
  14. 14. developerWorks® ibm.com/developerWorks/ Figure 4. Edit input file path g. Click OK to save the changes. h. Close the stage, clicking Yes when it prompts you to save the changes in the stage. i. Repeat the above steps for MDMQSPartySuspectReferenceMatchPerson. Step 7. Compile the jobs. a. Compile all the jobs inside the ELMDMQSELMDMRTJobs folder and its subfolders using Tool > Multiple Job compile from the designer client's menu. b. Follow the instructions in the wizard, and start compiling. Note: Batch versions of jobs can be found in the ELMDMQSELMDMRTJobs folder. Information Service Director (ISD) versions of these jobs can be found in the ELMDMQSELMDMRTJobsISD folder. Step 8. Generate match frequency data a. Use the director client to run the job ELMDMQSELMDMRTJobs MDMQS_Person_Match_Frequency_Generation to generate the match frequency data. When completed, it generates files PersonRefMatchTransFreq.txt and PersonRefMatchCandFreq.txt, as shown in Figure 5. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 14 of 34
  15. 15. ibm.com/developerWorks/ developerWorks® Figure 5. Generating match frequency data b. Similarly, run ELMDMQSELMDMRTJobsMDMQS_Org_Match_Frequency_Generation to generate files OrgRefMatchTransFreq.txt and OrgRefMatchCandFreq.txt Step 9. Run the test jobs. a. Use the director client to run the following batch jobs to test that they execute successfully on your system before you use the ISD jobs: • All jobs in ELMDMQSELMDMRTStandardization Testing • All the Jobs in ELMDMQSELMDMRTMatch Testing b. After running the jobs, view the output in the Sequential file to check the result Step 10. Deploy services using ISD a. Log on to the IBM Information Server (IIS) console. b. Click File > Import Information Services Project > Browse for the file ELMDMQS_ISDProject.xml in the EntryLevelMDMQualityStage directory. c. Keep all the default settings, and click Import. d. Open the Information Service Application (ELMDMQS) contained in the imported project. e. Click Develop, as shown in Figure 6. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 15 of 34
  16. 16. developerWorks® ibm.com/developerWorks/ Figure 6. Selecting the Develop icon f. Click Information Services Application. g. On the resulting screen, double-click the ELMDMQS application to open it. h. Go into Edit mode. i. In the Select a View window, click Services > ELMDMQSService, as shown in Figure 7. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 16 of 34
  17. 17. ibm.com/developerWorks/ developerWorks® Figure 7. Configuring jobs using ISD j. In the expanded tree, select Operations, and double-click the operations one at a time to edit each of them. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 17 of 34
  18. 18. developerWorks® ibm.com/developerWorks/ Figure 8. Checking the project name k. Edit each of the operations as follows: i. Ensure that the project name is correct, as shown in Box 1 in Figure 8. When you created the new project using the administration client, if you chose ELMDMQS as the name of the project, you can keep the defaults. If you specified another name, ensure that the project name and the job names are correct. To check the project and job names, click the Edit button, and browse to the project and job in the ISD folder. ii. Ensure that the Group Arguments into Structure option is enabled for inputs, as shown in Box 2 in Figure 8. iii. Change the input data type according to Table 1 below, as shown in Box 3 in Figure 8. iv. Check or uncheck the Accept array checkboxes according to Table 1, as shown in Box 4 in Figure 8 (the checkbox should show a checkmark if the table entry indicates Yes). v. Check or uncheck the output data type and Accept array checkboxes on the output tab according to Table 1. Table 1. ISD job configuration Operation name standardizeAddress Operation job name Inputs accept array ISD_MDMQS_Address_Standardization No Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Input data type AddressInput Outputs return array No Output data type AddressOutput Page 18 of 34
  19. 19. ibm.com/developerWorks/ developerWorks® elPersonMatch ISD_MDMQS_Party_Suspect_Reference_Match_Person Yes ELPersonMatchInput Yes ELPersonMatchOutput elOrgMatch ISD_MDMQS_Party_Suspect_Reference_Match_Org Yes ELOrgMatchInput Yes ELOrgMatchOutput standardizePhoneNumber ISD_MDMQS_Phone_Standardization No PhoneNumberInput No PhoneNumberOutput standardizeOrgName OrgNameInput No OrgNameOutput PersonNameInput No PersonNameOutput ISD_MDMQS_Organization_Standardization No standardizePersonNameISD_MDMQS_Person_Standardization No l. On the Provider Properties tab, modify the credentials according to your setup, as shown in Figure 9. Figure 9. Modifying your credentials m. Save and close the application. n. Deploy the application by clicking on the Develop menu. Figure 10 shows an example. Note the highlighted box that shows Select the Application ELMDMQS. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 19 of 34
  20. 20. developerWorks® ibm.com/developerWorks/ Figure 10. Deploying the application o. Click Deploy, as shown in the Figure 10. p. Leave the defaults, and click Deploy to start the deployment. Step 11. Set configuration values for QualityStage. Note: This example integration is being done for an InfoSphere MDM Server installation on which maintenance services are installed. During the installation of maintenance services, if you ran install_ELPCustom.sh then you can skip to Optimizing performance with key configuration parameters. Set the configuration values according to Table 2 in order to properly communicate with the IIS-QS server. Table 2. Configuration modifications Configuration name Default value /IBM/ThirdPartyAdapters/IIS/defaultCountry 185 /IBM/ThirdPartyAdapters/IIS/initialContextFactory This configuration element is used in conjunction with the provider URL to use JNDI registry initial context. A typical value for this element is com.ibm.websphere.naming.WsnInitialContextFactory. /IBM/ThirdPartyAdapters/IIS/providerURL iiop://<yourQSServer>:<QSServerBootstrapPort>. For example: iiop:// myIIS.torolab.ibm.com:2809. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 20 of 34
  21. 21. ibm.com/developerWorks/ developerWorks® /IBM/Party/Standardizer/Name/className com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter /IBM/Party/Standardizer/Address/className com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter Step 12: Use QualityStage (QS) name and address standardization. Use QS to standardize names and addresses that are entered into InfoSphere MDM Server. See Standardizing name, address and phone number information in the MDM developer's guide (see Resources) for more information. Step 13: Using QualityStage in suspect duplicate processing. QualityStage can be used with the InfoSphere MDM Server Suspect Duplicate Processing (SDP) feature. See Configuring IBM Information Server QualityStage integration for SDP in the MDM developer's guide (see Resources) for more information on using QualityStage with SDP. Optimizing performance with key configuration parameters After you install the InfoSphere MDM Server, tune the key configuration parameters for optimal performance. InfoSphere MDM Server and batch processor configuration 1. Increase the number of submitters to increase parallelism. Do this by editing the file <MDM_installation_Folder>/BatchProcessor/properties/Batch.properties. On an 8-way MDM Server box, 24 submitters are optimal. 2. Increase JVM heap settings for the batch processor. Do this by editing the file <MDM_installation_Folder>/BatchProcessor/bin/runbatch.sh. For example: for 24 submitters, 512MB of heap is sufficient. 3. Reduce BatchProcessor logging by setting the threshold to ERROR. Do this by editing <MDM_installation_Folder>/BatchProcessor/Log4J.properties and setting the logging threshold to ERROR, if it is not already. For example: log4j.appender.file.Threshold=ERROR. 4. Reduce MDM Server logging by setting the threshold to ERROR. Do this by editing Log4J.properties inside the properties.jar file at <WebSphere_Location>/profiles/ <ServerName>/installedApps/<CellName>/<InstanceName>/properties.jar. WebSphere Application Server configuration 1. Increase the JDBC connection pool size to support the parallelism. a. From the WebSphere Administration Console, go to Resources >JDBC > Data sources > DWLCustomer > Connection pool properties b. Increase the value for Maximum connections. The example setup uses 50. 2. Increase the prepared statement cache size. a. The size of the prepared statement cache depends on the number of unique SQL statements used in your application. For InfoSphere MDM Server, set it to 300 and monitor the application to determine if the cache size needs to be increased. b. It can be changed from the WebSphere Administration Console. Go to Resources > JDBC > Data sources > DWLCustomer > Connection pools > WebSphere Application Server data source properties. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 21 of 34
  22. 22. developerWorks® ibm.com/developerWorks/ 3. Increase the EJB cache size. Do this by using the WebSphere Administration Console to go to Servers > Application servers > [ServerName] > EJB Container Settings > EJB cache settings. The example uses 4000. 4. Change the JVM heap size and GC policy. a. From the WebSphere Administration Console, go to Servers > Application servers > [ServerName] > Java and Process Management > Process Definition > Java Virtual Machine. b. Indicate the initial heap size as 512 MB and the maximum heap size as 1024 MB. c. Use gencon GC policy for better performance. To use this GC policy, specify Xgcpolicy:gencon under Generic JVM arguments. While testing the example using the gencon GC policy, sometimes WebSphere Application Server generates unnecessary heapdumps. To disable this behavior, do the following after the server is started: i. From the WebSphere Administration Console, go to Servers > Application servers > [ServerName] > Performance > Performance and Diagnostic Advisor Configuration > Runtime (tab). ii. Uncheck the check box (ensure the checkbox is empty) for Enable automatic heap dump collection. Database tuning (DB2) It is recommended to follow best practices and recommendations to set up a database server. It is also recommended to closely monitor your database performance and to tune your database as needed for optimal performance and productive resource usage. This section briefly describes several recommendations on configuring and tuning a DB2 database. The basic concepts also apply to other types of databases. • Typically it is recommended that you use one set of dedicated disks for DB2 transaction logs and you use another set of dedicated disks for DB2 table spaces. If possible, it is even better to use different disk controllers for DB2 transaction logs and DB2 table spaces, because this gives you the flexibility to configure the disk controllers independently for different I/O patterns to favor writes instead of a mix of writes and reads. • Ensure read and write cache is enabled on the storage system. Monitor the cache effectiveness, and configure the cache size properly. • Properly plan the table spaces to ensure balanced I/O operations across all of the available disks. This avoids hot spots in your database and avoids limiting your overall database performance to the bandwidth of a few of the busiest disks. This maximizes the utilization of all the I/O bandwidth available from all the physical disks. • In addition to a well-planned table space layout over the I/O system, one of the biggest configuration parameters that affects performance dramatically is the database buffer pool size. Pay close attention to the overall buffer pool hit ratio, which tells how often it needs to go to the physical disks (which is very expensive) for the needed data that is found in the database buffer pools. • Strive for a buffer pool hit ratio of 80% or higher for data, and 90% or higher for indexes. Typically in MDM Server implementations, start with one big buffer pool for both data and indexes. If necessary, separate data and indexes into two different buffer pools to help ensure a good index buffer pool hit ratio. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 22 of 34
  23. 23. ibm.com/developerWorks/ developerWorks® • Because an MDM Server enables a good amount of customization and extension, analyze the most expensive SQLs from the database snapshot or other tools. Ensure that those SQLs have optimal access plans with the best indexes in place. Those recommendations should be considered together to achieve what you need for performance, because the behavior of one area might be just a symptom of another incorrectly configured or misbehaving area. Understanding performance test methodology used in the example Input data preparation The maintainContractPlus transaction was used for testing the example. Because the default parser from the BatchProcessor was used, the input data format had to be LineFeed delimited XML transactions. The first step toward getting the input data set was to create seed-data. The seed-data was generated using a home-grown, Java-based tool with key distributions based on U.S. Census data (2000). Some realistic data was added to make the overall parties closely match a typical MDM business scenario. The seed-data contained details such as name, gender, date of birth, addresses. As a second step, a template for maintainContractPlus transaction was created. This template had variables for key party details that needed to be filled in with generated seed-data. Another homegrown, Java-based tool was used to generate the XML transactions. One such transaction yielded one person with one name, one address, one contract, and one contact method. Table 3 shows the detailed profile of database tables populated by a single transaction. The example run used a total of one million such records as one input data set, representing one party and its associated attributes. Suspect duplicate data preparation The data generated in the example so far was primarily clean. A similar approach was used to generate dirty data, which included 40% duplicates. This data set was used when Suspect Duplicate Processing was turned on. During the initial load, the input data might have duplicate entries, where details from one record closely resemble those from another one. Such records are termed as suspect duplicates. Depending on how closely two records match, suspect duplicates are assigned a match category. To determine the match category, some critical data fields are used while comparing two records. The critical data fields include first name, last name, address, date of birth, gender, and social security number. Based on comparison results, the suspect duplicates are assigned a matchscore and a non-match-score, and then the match category is derived. Depending on the match category, InfoSphere MDM Server takes appropriate actions for the suspect duplicates. When testing the example, two sets of data were used: Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 23 of 34
  24. 24. developerWorks® ibm.com/developerWorks/ • 100% clean data with no suspect duplicates in the input data set • 60% clean data with 40% of the records as suspect duplicates. The example test included 4 types of suspect duplicates in the 60% clean data set. Population of each type of suspect duplicate was kept equal, and they were randomly distributed in the data using home-grown, Java-based tools. The details of this data set are shown in Table 3. Table 3. Details of input data with suspects sr# Matching critical data details Non-matching critical data details Population Weight (match/ non-match score) Match category 1 Gender, FirstName, LastName, Address, DOB, SSN None 10% 63/0 A1 2 Gender, FirstName, LastName, DOB,SSN Address 10% 60/3 A2 3 Gender, Address, DOB, SSN FirstName, LastName 10% 55/4 A2 4 Gender, Address, Last First Name (and SSN Name DOB field is empty) 10% 46/1 B The scores and categories in the Table 3 are calculated by InfoSphere MDM Server's deterministic matching approach, which is the default implementation for party-matching. In contrast, QualityStage matching offers a probabilistic matching approach, and it calculates only one composite weight. Data profile Table 4 shows the population of InfoSphere MDM Server database tables when the two sets of input data are loaded. Table 4. Database population Table name 100% clean data 60% clean data ADDRESS 1,000,000 700,000 ADDRESSGROUP 1,000,000 900,000 CONTACT 1,000,000 900,000 CONTACTMETHOD 1,000,000 900,000 CONTACTMETHODGROUP 1,000,000 900,000 CONTEQUIV 1,000,000 1,000,000 CONTRACT 1,000,000 1,000,000 CONTRACTCOMPONENT 1,000,000 1,000,000 CONTRACTROLE 1,000,000 1,000,000 IDENTIFIER 1,000,000 900,000 Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 24 of 34
  25. 25. ibm.com/developerWorks/ developerWorks® LOBREL 1,000,000 900,000 LOCATIONGROUP 2,000,000 1,800,000 MISCVALUE 1,000,000 1,000,000 PERSON 1,000,000 900,000 PERSONNAME 1,000,000 900,000 PERSONSEARCH 1,000,000 900,000 SUSPECT 0 300,000 Test methodology Different tests were performed to check stability and scalability and to measure the overhead associated with several commonly used features. All the tests were conducted in two solution configurations: • The MDM Server only solution, where InfoSphere MDM Server uses its own algorithm for standardization and matching. In this case, IBM Information Server is not required. • MDM Server + QS solution, where InfoSphere MDM Server uses QualityStage to do the standardization and matching. The methodology for all these tests was similar: 1. Set up the systems. Do the configuration and tuning of various components as mentioned in previous sections. 2. Prepare a set of input data with 10000 records using the approach mentioned. 3. Load the input data with 10000 records using 1 submitter in the batch processor. This is done to avoid deadlocks while working with an empty database. 4. Perform DB2 reorgchk on all the tables to update statistics. 5. Create a backup of the MDM Server database at this stage, and use it is as the starting point for all the tests. The following steps were used to run the example test: 1. Restore the database using the backup copy. 2. Change the database configuration if required for the test. For example, you may want to switch OFF Suspect Duplicate Processing. 3. Restart WebSphere Application Server running InfoSphere MDM Server. 4. Run data collection scripts in the background, which collect CPU statistics, IO statistics, and database snapshots. 5. Start the test to load the selected input dataset. 6. Collect the logs from InfoSphere MDM Server, WebSphere Application Server, and DB2 database server. 7. Derive response time and throughput from transactiondata.log as generated by InfoSphere MDM Server. Measuring performance results This section describes the performance measurements including the following: Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 25 of 34
  26. 26. developerWorks® ibm.com/developerWorks/ • Results showing very stable performance throughput and response time • Performance overhead of some commonly used features in the context of initial data loading • Scalability of throughput Test 1: Stability of throughput and response time The purpose of this test is to show whether the throughput and response times remain stable as the loading progresses and as the database size increases. This test also measures the system resource usage pattern along the test. The data for throughput and response time is derived from transactiondata.log, as generated by InfoSphere MDM Server. Various tests were conducted for both MDM Server only and MDM Server + QS scenarios, and all of them showed good stability. Table 5 shows the configuration settings for the first test. Table 5. Test 1 configuration Parameter Value Hardware/Software stack As described in example test environment InfoSphere MDM Server heap size Initial : 512MB; Max 1024MB InfoSphere MDM Server JVM GC policy gencon Number of submitters in batch processor 24 Batch processor JVM memory 512MB ISD job configurations (applicable to MDM Server + QS scenario only) Default Type of transaction used MaintainContractPlus Total volume 1 million parties and their associated records Input data quality 60% clean 40% suspected duplicates of various types Name standardization ON (default) Address standardization ON (StandardFormatingIndicator to N in the requestXML) Suspect duplicate processing ON History triggers Enabled Test 1 results: Stability results Figure 11 shows the throughput and response times captured for the MDM Server only scenario. The chart shows that throughput and response time are stable during the whole run duration. The results for the MDM Server + QS scenario are similar. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 26 of 34
  27. 27. ibm.com/developerWorks/ developerWorks® Figure 11. Throughput and response time Figure 12 shows that by configuring a sufficient number of submitters to the required number, almost all CPU resources on WebSphere Application Server running InfoSphere MDM Server can be used, and the system does not have any other bottlenecks. Figure 10 also shows the resource usage on other systems. Figure 12. Resource usage Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 27 of 34
  28. 28. developerWorks® ibm.com/developerWorks/ Test 2: Feature overheads The purpose of the tests is to measure the overhead of four commonly used features of InfoSphere MDM Server. Under this series of tests, the overhead of the following were measured: • • • • Name standardization Address standardization Suspect duplicate processing History triggers Overhead is expressed as a percentage reduction in throughput per unit of time when the feature is enabled. For example, 5% overhead associated with a particular feature means that if throughput was 100 transactions per second (TPS), it becomes 95 TPS due to overhead when the feature is enabled. Throughput is measured as total data volume loaded / total time taken. Various tests were conducted for both MDM Server only and MDM Server + QS scenarios, enabling one or more features at a time. In the MDM Server + QS scenario, the overheads of standardization and suspect duplicate processing should be higher because they involve extra processing by QualityStage. Table 6 shows the configuration settings for the second test. Table 6. Test 2 configuration Parameter Value Hardware/Software stack As described in example test environment InfoSphere MDM Server heap size Initial: 512MB ; Max 1024MB InfoSphere MDM Server JVM GC policy Default Number of submitters in batch processor 24 Batch processor JVM memory 512MB ISD job configurations (applicable to MDM Server + QS scenario only) Default Type of transaction used MaintainContractPlus Total volume 1 million parties and their associated records Input data quality a) 100% clean; b) 60% clean Following are some notes about the configuration: • Name standardization was turned ON or OFF by setting /IBM/Party/ ExcludePartyNameStandardization/enabled to FALSE or TRUE, respectively. • Address standardization was effectively switched ON or OFF by setting StandardFormatingIndicator to N/Y in the transaction request XMLs. • Suspect duplicate processing was switched ON or OFF by setting the following to TRUE or FALSE respectively in the configuration table: • /IBM/Party/SuspectProcessing/enabled • /IBM/Party/SuspectProcessing/AddParty/returnSuspect Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 28 of 34
  29. 29. ibm.com/developerWorks/ developerWorks® Test 2 results: Feature overheads Standardization The following table shows the overhead of standardization only for the MDM Server only scenario. Tests were conducted with both datasets (100% clean and 60% clean) when suspect duplicate processing was switched ON. History triggers were enabled during these tests. Table 7. Overhead of standardization Overhead SDP OFF SDP ON (100% clean) SDP ON (60% clean) Overhead of name standardization 2% 3% 3% Overhead of address standardization 2% 2% 0% Overhead of name and address standardization 4% 3% 2% Note: With 60% clean data, there are fewer unique addresses. This can result in less overhead. Suspect duplicate processing Table 8 shows the overhead of suspect duplicate processing with and without standardization in the MDM Server only scenario. Tests were conducted with both datasets (100% clean and 60% clean). History triggers were enabled during these tests. Table 8. Overhead of suspect duplicate processing Overhead 100% clean data 60% clean data Overhead of suspect duplicate processing 3% 20% Overhead of suspect duplicate processing along with name and address standardization 6% 21% History triggers If history triggers are enabled, the IO requirement on the database server increases significantly (nearly doubles). With enough IO bandwidth provided, the overhead is small (approximately 5%). Test 3: Scalability tests By definition, scalability is a measure of how well the throughput increases when more load is put on the system. However, for the example test, the number of processors did not actually vary. Instead, the number of parallel requests to the InfoSphere MDM Server were changed by varying the number of submitters in the batch processor. Data points were collected between 1 submitter and 24 submitters, at which point the system was clearly saturated. The test was conducted for both the MDM Server only and the MDM Server + QS scenarios. Tests were conducted in different configurations, and all of them showed near linear scalability. Table 9 shows the configuration settings for the third test. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 29 of 34
  30. 30. developerWorks® ibm.com/developerWorks/ Table 9. Test 3 configuration Parameter Value Hardware/Software stack As described in example test environment InfoSphere MDM Server heap size Initial: 512MB; Max 1024MB InfoSphere MDM Server JVM GC policy Default Number of submitters in batch processor Varied between 1 to 24 Batch processor JVM memory 512MB ISD job configurations (applicable to the MDM Server + QS scenario only) Default Type of transaction used MaintainContractPlus Total volume 15000 to 100,000 records Input data quality 60% clean Name standardization ON (default) Address standardization ON (StandardFormatingIndicator to N in the requestXML) Suspect duplicate processing ON History triggers Enabled Test 3 results: Scalability results Figure 13 shows the scalability for the MDM Server only scenario. As shown by green line, the throughput increases almost linearly with an increase in the number of submitters. The example configuration utilized more than 90% of CPU capacity on the server running InfoSphere MDM Server. The results for MDM Server + QS are similar. Figure 13. Scalability of InfoSphere MDM Server with SDP ON Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 30 of 34
  31. 31. ibm.com/developerWorks/ developerWorks® Conclusion Designed to provide flexibility in its deployments, developed on leading technology, and offering unmatched performance and scalability, InfoSphere Master Data Management Server has been the leading choice for a large number of organizations across a range of industries when implementing their MDM solutions. As the leader, IBM has the largest number of successfully deployed MDM implementations in the market today. This article explained what maintenance services are and how to set up maintenance services in an InfoSphere MDM Server environment. You saw enough details about configuration and tuning tips so you can follow and get maintenance service batch up and running with high performance. This article also covers the steps for setting up Information Server QualityStage for standardization and matching, if such configuration is required. Some key performance data points from various common scenarios are described, and they show that maintenance services, when being used for initial load, provides sustainable high performance and excellent scalability. Finally, this article summarized performance overhead measurements of some key features commonly used in MDM Server implementations. You might find them useful for capacity planning an MDM Server system based on the chosen features and for ensuring the required performance during initial load. Acknowledgments We would like to thank Lena Woolf, Berni Schiefer, and Karen Chouinard for their input and suggestions. We would also like to thank the other MDM Server team members for their support during this project. Notices ©IBM Corporation 2009. All Rights Reserved. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. ALTHOUGH EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS DOCUMENT, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS DOCUMENT OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS DOCUMENT IS INTENDED TO, OR SHALL HAVE THE EFFECT OF CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. All performance data contained in this publication was obtained in the specific operating environment and under the conditions described above and is presented as an illustration only. Performance obtained in other operating environments may vary and customers should conduct their own testing. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 31 of 34
  32. 32. developerWorks® ibm.com/developerWorks/ Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. The information in this document concerning non-IBM products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility or any other claims related to non-IBM products. Questions about the capabilities of non-IBM products should be addressed to the supplier(s) of those products. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this publication to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth, savings or other results. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 32 of 34
  33. 33. ibm.com/developerWorks/ developerWorks® Resources Learn • See IBM Redbook™Master Data Management: Rapid Deployment Package for MDM for more instructions. • Refer to the IBM InfoSphere MDM Server Information Center for more instructions. • Refer to the WebSphere Application Server, Version 6.1 Information Center to install IBM WebSphere Application Server Network Deployment, Version 6.1, and upgrade it with Fixpack 11. • Refer to the IBM DB2 Database for Linux®, UNIX®, and Windows Information Center to install DB2 Database Server, Version 9.5. • Refer to the IBM Information Server Information Center to install IIS Server, Version 8.0.1. • Learn more from IBM Redpaper WebSphere Customer Center: Understanding Performance • Discover DB2 Tuning Tips for OLTP Applications from this classic developerWorks article. • Explore the Information Management Software for z/OS Solutions Information Center. • Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more. • Stay current with developerWorks technical events and webcasts. Get products and technologies • Build your next development project with IBM trial software, available for download directly from developerWorks. Discuss • Participate in the discussion forum for this content. • Check out the developerWorks blogs and get involved in the developerWorks community. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 33 of 34
  34. 34. developerWorks® ibm.com/developerWorks/ About the authors Neeraj Singh Neeraj R Singh is currently a senior performance engineer working on Master Data Management Server performance. He has prior experience leading the Java technologies test team for functional, system, and performance tests as technical lead and test project leader. He joined IBM in 2000 and holds a Bachelors Degree in Electronics and Communications Engineering. Yongli An Yongli An is an experienced performance engineer focusing on Master Data Management products and solutions. He is also experienced in DB2 database server and WebSphere performance tuning and benchmarking. He is an IBM Certified Application Developer and Database Administrator - DB2 for Linux, UNIX, and Windows. He joined IBM in 1998. He holds a bachelor degree in Computer Science and Engineering and a Masters degree in Computer Science. Currently Yongli is the manager of the MDM performance and benchmarks team, focusing on Master Data Management Server performance and benchmarks, and helping customers achieve optimal performance for their MDM systems. © Copyright IBM Corporation 2009 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/) Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 34 of 34

×