Administration and Programming for DB2

  • 960 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
960
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
14
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. IBM DB2 Intelligent Miner Scoring Administration and Programming for DB2 Version 8.1 SH12-6745-00
  • 2. IBM DB2 Intelligent Miner Scoring Administration and Programming for DB2 Version 8.1 SH12-6745-00
  • 3. Note Before using this information and the product it supports, be sure to read the information in Appendix H, “Notices” on page 207. First Edition, October 2002 This edition applies to Version 8.1 of IBM DB2 Intelligent Miner Scoring, program number 5765–F36, and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2001, 2002. All rights reserved. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
  • 4. Contents Figures . . . . . . . . . . . . . vii Chapter 4. Getting started . . . . . . . 19 Quick start . . . . . . . . . . . . 19 Tables . . . . . . . . . . . . . . ix Installation . . . . . . . . . . . 20 Configuring IM for Data to export PMML About this book . . . . . . . . . . xi models . . . . . . . . . . . . . 20 Who should use this book . . . . . . . xi Configuring the database environment . . 21 Conventions and terminology used in this Creating database objects . . . . . . 22 book. . . . . . . . . . . . . . . xi Verifying the installation and configuration 22 How this book is structured . . . . . . xiii Executing sample applications . . . . . 22 How to read the syntax diagrams . . . . xiii Generating SQL scripts from your own How to send your comments . . . . . . xiv mining models . . . . . . . . . . 23 Sample components . . . . . . . . . 23 Completing the practice exercises . . . . . 25 Part 1. Guide . . . . . . . . . . 1 Creating a table and importing data . . . 25 Importing a mining model . . . . . . 26 Chapter 1. Introducing the Intelligent Miner Applying a model and getting results products . . . . . . . . . . . . . 3 values . . . . . . . . . . . . . 27 IBM DB2 Intelligent Miner Scoring . . . . . 3 Extracting information from a model. . . 31 IBM DB2 Intelligent Miner Modeling . . . . 4 Applying models created with IM IBM DB2 Intelligent Miner Visualization . . . 5 Modeling . . . . . . . . . . . . 31 IBM DB2 Intelligent Miner for Data . . . . 5 Using IM Scoring Java Beans to score records . . . . . . . . . . . . . 33 Chapter 2. Introducing IM Scoring . . . . 7 Using idmmkSQL to work with your own IM Scoring . . . . . . . . . . . . . 7 mining models . . . . . . . . . . 38 Mining functions supported by IM Scoring 7 Using IM for Data to produce models . . . 8 Chapter 5. Using IM Scoring . . . . . . 41 Using IM Modeling to produce models . . 9 Creating database objects . . . . . . . 41 Scoring data types, methods, and functions 9 Enabling databases. . . . . . . . . 41 PMML: A markup language for data mining 11 Disabling databases . . . . . . . . 42 Converting models . . . . . . . . . . 11 Checking databases . . . . . . . . 43 Online scoring with IM Scoring Java Beans . 12 Working with mining models . . . . . . 43 What is new in version 8.1 . . . . . . . 12 Exporting models from IM for Data . . . 43 Ease of use . . . . . . . . . . . 13 Converting exported models . . . . . 44 E-business enhancements . . . . . . 13 Generating SQL statements from models 44 Functional enhancements . . . . . . 13 Importing mining models . . . . . . 45 Standards conformance . . . . . . . 13 Providing models by means of IM Platform support . . . . . . . . . 14 Modeling . . . . . . . . . . . . 48 Shared infrastructure with IM Modeling . 14 Applying mining models. . . . . . . . 49 Limitations . . . . . . . . . . . 14 Querying model field names . . . . . 49 Using the application functions . . . . 50 Chapter 3. Data mining functions . . . . 17 Specifying data by means of REC2XML . . 51 Classification. . . . . . . . . . . . 17 Specifying data by means of DM_applData 52 Clustering. . . . . . . . . . . . . 17 Specifying data by means of CONCAT . . 53 Regression/Prediction. . . . . . . . . 18 Results data . . . . . . . . . . . 53 Code sample for applying models. . . . 55 © Copyright IBM Corp. 2001, 2002 iii
  • 5. Getting application results . . . . . . . 56 DM_expRegModel . . . . . . . . . . 99 Handling missing values . . . . . . . 57 DM_getClasCostRate. . . . . . . . . 100 Using IM Scoring Java Beans . . . . . . 58 DM_getClasMdlName . . . . . . . . 101 Setting environment variables . . . . . 59 DM_getClasMdlSpec . . . . . . . . . 102 Specifying the mining model to be used . 60 DM_getClasTarget . . . . . . . . . 103 Accessing model metadata . . . . . . 61 DM_getClusConf . . . . . . . . . . 104 Specifying a data record . . . . . . . 62 DM_getClusMdlName . . . . . . . . 105 Applying scoring . . . . . . . . . 62 DM_getClusMdlSpec. . . . . . . . . 106 Accessing computed results . . . . . . 62 DM_getClusScore . . . . . . . . . . 107 Scoring example . . . . . . . . . 63 DM_getClusterID . . . . . . . . . . 108 ScoringException classes . . . . . . . 64 DM_getClusterName. . . . . . . . . 109 DM_getConfidence . . . . . . . . . 110 Chapter 6. Administrative tasks . . . . . 65 DM_getNumClusters. . . . . . . . . 111 Using IM Scoring in a multilanguage DM_getPredClass . . . . . . . . . . 112 environment . . . . . . . . . . . . 65 DM_getPredValue. . . . . . . . . . 113 Getting error information . . . . . . . 65 DM_getQuality . . . . . . . . . . 114 Getting support . . . . . . . . . . . 66 DM_getQuality(clusterid) . . . . . . . 115 Product README . . . . . . . . . 66 DM_getRBFRegionID . . . . . . . . 116 'Frequently asked questions' and 'Hints DM_getRegMdlName . . . . . . . . 117 and tips' . . . . . . . . . . . . 67 DM_getRegMdlSpec . . . . . . . . . 118 Problem identification worksheet . . . . 67 DM_getRegTarget . . . . . . . . . . 119 Getting product information . . . . . 68 DM_impApplData . . . . . . . . . 120 Getting trace information . . . . . . 69 DM_impClasFile . . . . . . . . . . 121 Getting DB2 diagnostic information . . . 71 DM_impClasFileE. . . . . . . . . . 122 DM_impClasModel . . . . . . . . . 123 DM_impClusFile . . . . . . . . . . 124 Part 2. Reference . . . . . . . . 73 DM_impClusFileE . . . . . . . . . 125 DM_impClusModel . . . . . . . . . 127 Chapter 7. Overview of IM Scoring DM_impRegFile . . . . . . . . . . 128 database objects . . . . . . . . . . 75 DM_impRegFileE . . . . . . . . . . 129 Data types provided by IM Scoring . . . . 75 DM_impRegModel . . . . . . . . . 130 Methods provided by IM Scoring . . . . . 77 Functions provided by IM Scoring . . . . 77 Chapter 10. IM Scoring command Parameter sizes . . . . . . . . . . . 81 reference . . . . . . . . . . . . 131 The idmcheckdb command . . . . . . 132 Chapter 8. IM Scoring methods reference 83 The idmdisabledb command . . . . . . 132 DM_expDataSpec . . . . . . . . . . 84 The idmenabledb command . . . . . . 133 DM_getFldName . . . . . . . . . . 85 The idminstfunc command. . . . . . . 135 DM_getFldType . . . . . . . . . . . 86 The idmlevel command . . . . . . . . 135 DM_getNumFields . . . . . . . . . . 87 The idmlicm command . . . . . . . . 135 DM_impDataSpec . . . . . . . . . . 88 The idmmkSQL command . . . . . . . 136 DM_isCompatible . . . . . . . . . . 89 The idmuninstfunc command . . . . . . 138 The idmxmod command . . . . . . . 139 Chapter 9. IM Scoring functions reference 91 DM_applData . . . . . . . . . . . 92 Chapter 11. IM Scoring Java Beans DM_applyClasModel . . . . . . . . . 94 reference . . . . . . . . . . . . 141 DM_applyClusModel . . . . . . . . . 95 DM_applyRegModel . . . . . . . . . 96 DM_expClasModel. . . . . . . . . . 97 Part 3. Appendixes . . . . . . . 143 DM_expClusModel . . . . . . . . . 98 iv Administration and Programming for DB2
  • 6. Appendix A. Installing IM Scoring . . . 145 Installing IM Scoring Java Beans on Installing IM Scoring on AIX systems . . . 145 Windows systems . . . . . . . . . . 165 Prerequisites for AIX systems . . . . . 145 Prerequisites for Windows systems . . . 165 Installing IM Scoring. . . . . . . . 146 Installing IM Scoring Java Beans . . . . 165 Uninstalling IM Scoring. . . . . . . 148 Installing IM Scoring on Linux systems . . 149 Appendix C. Migration from IM Scoring Prerequisites for Linux systems . . . . 149 V7.1 . . . . . . . . . . . . . . 167 Installing IM Scoring. . . . . . . . 149 Working with IM Scoring V7.1 and V8.1 in Uninstalling IM Scoring. . . . . . . 150 parallel . . . . . . . . . . . . . 167 Installing IM Scoring on Sun Solaris systems 150 Exporting and importing models with the Prerequisites for Sun Solaris systems . . 150 use of compression . . . . . . . . . 168 Installing IM Scoring. . . . . . . . 151 Exporting and importing models by means Uninstalling IM Scoring. . . . . . . 152 of DB2 Utilities . . . . . . . . . . 168 Installing IM Scoring on Windows systems 153 Importing models in unfenced mode . . . 169 Prerequisites for Windows systems . . . 153 Applying Neural models . . . . . . . 169 Installing IM Scoring. . . . . . . . 154 Using the function DM_getClusterID . . . 170 Uninstalling IM Scoring. . . . . . . 156 Configuring the database management Appendix D. Coexistence with IM system on UNIX systems . . . . . . . 157 Modeling . . . . . . . . . . . . 171 Enabling the DB2 instance on UNIX Shared schema . . . . . . . . . . . 171 systems . . . . . . . . . . . . 157 Shared data types . . . . . . . . . . 171 Disabling the DB2 instance on UNIX Shared functions . . . . . . . . . . 171 systems . . . . . . . . . . . . 157 Shared methods . . . . . . . . . . 172 Configuring the database management Shared commands . . . . . . . . . 172 system on Windows systems . . . . . . 158 Enabling the DB2 instance on Windows Appendix E. Error messages . . . . . 173 systems . . . . . . . . . . . . 158 DB2 SQL states . . . . . . . . . . 173 Enabling IM for Data to export PMML or IM Scoring SQL states . . . . . . . . 174 XML models . . . . . . . . . . . 158 IM Scoring error events . . . . . . . . 174 On AIX systems . . . . . . . . . 158 On Sun Solaris systems . . . . . . . 159 Appendix F. The DB2 REC2XML function 199 On Windows systems . . . . . . . 160 Appendix G. IM Scoring conformance to Appendix B. Installing IM Scoring Java PMML . . . . . . . . . . . . . 203 Beans . . . . . . . . . . . . . 161 IM Scoring application . . . . . . . . 203 Installing IM Scoring Java Beans on AIX IM Scoring conversion tools . . . . . . 204 systems . . . . . . . . . . . . . 161 Radial-Basis Function prediction . . . . . 205 Prerequisites for AIX systems . . . . . 161 Installing IM Scoring Java Beans . . . . 161 Appendix H. Notices . . . . . . . . 207 Uninstalling IM Scoring Java Beans . . . 162 Trademarks . . . . . . . . . . . . 209 Installing IM Scoring Java Beans on Linux systems . . . . . . . . . . . . . 162 Bibliography and related information . . 211 Prerequisites for Linux systems . . . . 163 IBM DB2 Intelligent Miner publications . . 211 Installing IM Scoring Java Beans . . . . 163 IBM DB2 Universal Database (DB2 UDB) Uninstalling IM Scoring Java Beans . . . 163 publications. . . . . . . . . . . . 212 Installing IM Scoring Java Beans on Sun Related information . . . . . . . . . 212 Solaris systems. . . . . . . . . . . 164 Prerequisites for Sun Solaris systems . . 164 Installing IM Scoring Java Beans . . . . 164 Index . . . . . . . . . . . . . 213 Uninstalling IM Scoring Java Beans . . . 164 Contents v
  • 7. vi Administration and Programming for DB2
  • 8. Figures 1. The IM Scoring process . . . . . . . 9 3. Model import processes . . . . . . 46 2. Architecture sample to realize a 4. Applying a model to data . . . . . 55 call-center scenario . . . . . . . . 12 © Copyright IBM Corp. 2001, 2002 vii
  • 9. viii Administration and Programming for DB2
  • 10. Tables 1. Formatting conventions . . . . . . xii 14. Data types specific to IM Scoring 75 2. Abbreviations . . . . . . . . . xii 15. Methods for type DM_LogicalDataSpec 77 3. PMML model types . . . . . . . . 4 16. Functions for working with scoring data 4. Sample components for the Clustering type DM_ApplicationData . . . . . 78 mining function of IM Scoring . . . . 24 17. Functions for working with data mining 5. Sample components for IM Scoring Java model type DM_ClasModel . . . . . 78 Beans . . . . . . . . . . . . 25 18. Functions for working with scoring 6. Import functions and related data types result type DM_ClasResult . . . . . 78 and tables . . . . . . . . . . . 46 19. Functions for working with scoring 7. Import functions using a specific XML result type DM_ClusResult . . . . . 79 encoding . . . . . . . . . . . 47 20. Functions for working with data mining 8. Import functions using CLOB values 48 model type DM_ClusteringModel . . . 79 9. Functions for applying models . . . . 50 21. Functions for working with data mining 10. Application functions and their data model type DM_RegressionModel . . . 80 types and results data . . . . . . . 53 22. Functions for working with scoring 11. Results functions and their purpose 56 result type DM_RegResult . . . . . 80 12. IM Scoring Java Beans methods for 23. Mining field types . . . . . . . . 86 accessing model metadata . . . . . 61 24. The idmcheckdb messages . . . . . 132 13. IM Scoring Java Beans methods for accessing computed results . . . . . 62 © Copyright IBM Corp. 2001, 2002 ix
  • 11. x Administration and Programming for DB2
  • 12. About this book IBM DB2® Intelligent Miner™ Scoring is an application that integrates the model application functionality of Intelligent Miner for Data Version 6.1 or higher with the DB2 Universal Database™. Intelligent Miner Scoring enables you to import and apply mining models, and to access the results. Throughout this book, the following abbreviations are used: v IBM DB2 Intelligent Miner Scoring V8.1 is referred to as IM Scoring. v IBM DB2 Intelligent Miner Scoring V7.1 is referred to as IM Scoring V7.1. v IBM DB2 Intelligent Miner Modeling V8.1 is referred to as IM Modeling. v IBM DB2 Intelligent Miner Visualization V8.1 is referred to as IM Visualization. v IBM DB2 Intelligent Miner for Data is referred to as IM for Data. This book describes how to install and use IM Scoring and IM Scoring Java Beans. This book also provides a full reference resource to the database objects provided by IM Scoring. References in this book to DB2 refer to DB2 UDB Version 7.2 or higher. Who should use this book This book is intended for the following users: v DB2 database administrators who are familiar with DB2 administration concepts, tools, and techniques v Users of IM for Data who are familiar with the concepts underlying the different data mining functions that IM for Data provides v DB2 application programmers who are familiar with SQL and with one or more programming languages that can be used for DB2 applications Conventions and terminology used in this book In DB2, the names of the scoring methods, functions, data types, tables, and table columns are created in capital letters, even if you used, for example, lowercase letters. In this book, these names are represented in mixed case for better readability. © Copyright IBM Corp. 2001, 2002 xi
  • 13. The following table shows the formatting conventions used in this book. Table 1. Formatting conventions Convention used How it is used Interface elements, for example, menu Click OK. bars, buttons, and labels are shown in boldface. Menu instructions are shown in boldface Click File —> Export. and sequential instructions are separated by arrows. Command syntax is shown in a db2 -stf idmtab.db2 monospaced font. The names of the following are shown in The SQL INSERT command inserts the a monospaced font: model into a column of the table v Files and directories ClusterModels, which is configured for the data type DM_ClusteringModel. v Database tables and columns v SQL methods, functions, and data types Variables within command syntax, which idmdisabledb <db name> you should replace by a real value, are shown in italics between angle brackets. Italics are used to highlight the These functions are also referred to as introduction of a new term. user-defined functions. The following table shows the abbreviations used in this book. Table 2. Abbreviations Abbreviation Full form CRM Customer Relationship Management GUI Graphical user interface ICU International Classes for Unicode PMML Predictive Model Markup Language RBF Radial Basis Function RPM Redhat Package Manager SQL Structured Query Language UDF User-defined function UDM User-defined method UDT User-defined data type XML Extensible Markup Language xii Administration and Programming for DB2
  • 14. How this book is structured This book is divided into the following parts: Part 1. Guide Contains the following: v An overview of the functionality available with IM Scoring v Instructions on how to get started with IM Scoring v Guidance on how to use IM Scoring and how to perform administrative tasks Part 2. Reference Provides a reference resource to all the IM Scoring database objects and utilities. Part 3. Appendixes Contains the following: v Instructions on how to install, configure, and uninstall IM Scoring and IM Scoring Java Beans v Information on migration issues from IM Scoring V7.1 and on conformance with PMML v Instructions on using the DB2 function REC2XML v Information about the error messages produced by IM Scoring How to read the syntax diagrams In the reference part of this book, the syntax for IM Scoring’s functionality is described using the following structure: v Read the syntax diagrams from left to right and top to bottom, following the path of the line. The ─── symbol indicates the beginning of a statement. The ─── symbol indicates that the statement syntax is continued on the next line. The ─── symbol indicates that a statement is continued from the previous line. The ── symbol indicates the end of a statement. v Required items appear on the horizontal line (the main path). required item v Optional items appear below the main path. optional item About this book xiii
  • 15. v If you can choose from two or more items, they appear in a stack. If you must choose one of the items, one item of the stack appears on the main path. required choice1 required choice2 If choosing none of the items is an option, the entire stack appears below the main path. optional choice1 optional choice2 A repeat arrow above a stack indicates that you can make more than one choice from the stacked items. optional choice1 optional choice2 v Keywords must be spelled exactly as shown. Variables appear in lowercase letters (for example, encoding name). They represent names or values that you must supply. v If punctuation marks, parentheses, arithmetic operators, or other such symbols are shown, you must enter them as part of the syntax. How to send your comments Your feedback is important in helping us to provide you with the most accurate and high-quality information possible. If you have any comments about this book: v Send your comments by e-mail to swsdid@de.ibm.com. Be sure to include the name and part number of the book, and to say which version of IM Scoring you are using. If applicable, include the specific location of the text you are commenting on. For example, give a page number or table number. v Fill out the Readers’ Comments form at the back of this book. Return it by mail, by fax, or by giving it to an IBM representative. The mailing address is on the back of the form. The fax number is +49-(0)7031-16-4892. xiv Administration and Programming for DB2
  • 16. Part 1. Guide This part introduces you to IM Scoring and gives you instructions for its use. v For an overview of the Intelligent Miner family of products, see Chapter 1, “Introducing the Intelligent Miner products” on page 3. v For an overview of IM Scoring, see Chapter 2, “Introducing IM Scoring” on page 7. v For a quick overview of what you need to do to get up and running with IM Scoring, see Chapter 4, “Getting started” on page 19. This chapter also contains a tutorial in the form of sample exercises. v For full instructions in the use of IM Scoring, see Chapter 5, “Using IM Scoring” on page 41. v For instructions on doing a number of administrative tasks connected with IM Scoring, see Chapter 6, “Administrative tasks” on page 65. © Copyright IBM Corp. 2001, 2002 1
  • 17. 2 Administration and Programming for DB2
  • 18. Chapter 1. Introducing the Intelligent Miner products The IBM DB2 Intelligent Miner Version 8.1 is a set of the following products: v Intelligent Miner Scoring v Intelligent Miner Modeling v Intelligent Miner Visualizing These products support rapid enablement of Intelligent Miner analytics embedded in Business Intelligence (BI), eCommerce, or traditional OLTP application programs. v You can use IM Scoring to deploy PMML models that were created by one of the Intelligent Miner products or by other applications and tools that support interoperability through the use of PMML models. v You can use IM Modeling to build data mining models. v You can use IM Visualizing to browse PMML models that are created by one of the Intelligent Miner products or by other applications and tools that support interoperability through the use of PMML models. PMML is a standard format for data mining models. Based on XML, PMML provides a standard that enables data mining models to be shared between the applications of different vendors. The intention is to provide a vendor-independent method of defining models. In this way, proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. You can find more information about PMML on the Web site of the Data Mining Group (DMG) at http://www.dmg.org. IBM DB2 Intelligent Miner Scoring IM Scoring provides scoring technology as database extenders, DB2 extenders, and Oracle cartridges. It enables application programs to apply PMML models to large databases, subsets of databases, or single rows or cases. Application programs use the SQL API, which consists of user-defined functions (UDFs) and user-defined methods (UDMs), to perform the scoring operation. The PMML models might have been created by one of the Intelligent Miner products or by other applications and tools that support interoperability through the use of PMML models. The following table shows the different PMML models that can be applied by different mining algorithms. © Copyright IBM Corp. 2001, 2002 3
  • 19. Table 3. PMML model types PMML model type Mining algorithm Center-based clustering Neural Clustering algorithm Distribution-based clustering Demographic Clustering algorithm Neural networks Neural Classification algorithm, Neural Prediction algorithm Decision tree Tree Classification algorithm Regression Logistic Regression algorithm, Polynomial Regression algorithm, Linear Regression algorithm Additionally, IM Scoring supports models that are built by the RBF Prediction algorithm of the Intelligent Miner for Data. These models are not yet part of PMML. You can export these models in XML format from the Intelligent Miner for Data and use them with IM Scoring. Mining models that are applied by the SQL API of IM Scoring must be contained in database tables. If the mining models are created by means of IM Modeling, they can be directly applied because IM Modeling writes the models into database tables. If the mining models are created by means of the Intelligent Miner for Data, they must be exported from the Intelligent Miner for Data and imported into database tables. IM Scoring provides UDFs to import the models. You can also apply PMML V1.1 or PMML V2.0 models that are created with tools from different vendors. IM Scoring provides a feature called Single Record Scorer. The Single Record Scorer consists of a Java API. You can use this feature to score single or multiple data records against a mining model that is contained in a flat file. The Single Record Scorer is designed for applications where the online scoring of data records is the main task. IBM DB2 Intelligent Miner Modeling IM Modeling provides IM modeling technology as DB2 extenders. It enables SQL application programs to call associations discovery, clustering, and classification operations to develop analytic models based on data accessed by DB2 Universal Database Version 7 or Version 8 SQL. The resulting models are in PMML V2.0 format. They can be processed by IM Scoring or IM Visualizing. 4 Administration and Programming for DB2
  • 20. IM Modeling consists of an SQL API. By using this SQL API, you can build Associations, Demographic Clustering, and Tree Classification PMML models that are stored in DB2 tables. The data mining functions are based on the mining functions included in the Intelligent Miner for Data. IBM DB2 Intelligent Miner Visualization IM Visualizing provides the following JAVA visualizers to present data modeling results for analysis: v Associations Visualizer v Classification Visualizer v Clustering Visualizer You can use the Intelligent Miner Visualizers to visualize PMML-conforming mining models. Applications can call these visualizers to present model results, or you can deploy the visualizers as applets in a Web browser for ready dissemination. The models might have been developed by using IM Modeling or other applications and tools that support interoperability through the use of PMML models, or models of the Intelligent Miner for Data might have been exported as PMML models. The Intelligent Miner Visualizers are included in Intelligent Miner for Data Version 8.1. IBM DB2 Intelligent Miner for Data Intelligent Miner for Data Version 8.1 is an independent product that provides the following mining functions to build and apply mining models based on database or flat file data: v Associations mining function v Classification mining function including the following algorithms: – Neural Classification – Tree Classification v Clustering mining function including the following algorithms: – Demographic Clustering – Neural Clustering v Prediction mining functions including the following algorithms: – Neural Prediction – Polynomial Regression – RBF Prediction v Processing functions Chapter 1. Introducing the Intelligent Miner products 5
  • 21. The Processing functions can be used only on database tables. v Sequential Patterns mining function v Similar Sequences mining function v Statistics functions Version 8 of the Intelligent Miner for Data includes the Intelligent Miner Visualizers. It also includes the PMML conversion component of IM Scoring, which allows you to export mining models in PMML format. 6 Administration and Programming for DB2
  • 22. Chapter 2. Introducing IM Scoring This chapter introduces IM Scoring. It describes the functionality provided by IM Scoring, and provides information about PMML and model conversion. This chapter also describes what is new in IM Scoring V8.1. IM Scoring IM Scoring is an add-on service to DB2 that extends the capabilities of DB2 to include data mining functions. Mining models continue to be built through the use of the following tools: IM for Data This produces models that can be exported as PMML models. IM Modeling This provides mining models in PMML 2.0 format. Other tools Any other tool that provides mining models in PMML 1.1 or PMML 2.0 format. You can use the IM Scoring functionality to import certain types of mining models into a DB2 table, to apply the models to data within DB2, and to access the results. This functionality comprises the scoring functions of IM Scoring. The results of applying the model are referred to as scoring results. These results differ in content according to the type of model applied. IM Scoring includes functions to retrieve the values of scoring results. IM Scoring is available on the following operating systems: v AIX® v Linux v Sun Solaris v Windows NT®, Windows® 2000, Windows XP Mining functions supported by IM Scoring IM Scoring supports the application mode for the following IM for Data mining and statistical functions: v Demographic and Neural Clustering v Tree and Neural Classification © Copyright IBM Corp. 2001, 2002 7
  • 23. v RBF and Neural Prediction v Polynomial Regression For a short introduction to these mining functions, see Chapter 3, “Data mining functions” on page 17. IM Scoring supports the application of the following models created by IM Modeling: v Demographic Clustering v Tree Classification For descriptions of these mining models, see the IM Modeling documentation, IM Modeling Administration and Programming. In this guide, Chapter 3, “Data mining functions” on page 17 also contains brief introductory information about mining models. In addition, IM Scoring supports the application of Logistic Regression models. Within IM Scoring, the mining functions are grouped into the mining types Clustering, Classification, and Regression as follows: v Clustering includes Demographic and Neural Clustering v Classification includes Tree and Neural Classification v Regression includes RBF Prediction, Neural Prediction, Polynomial Regression, and Logistic Regression Scoring functions are provided to work with each of these types. Each scoring function includes different algorithms to deal with the different mining functions included within a type. For example, the Clustering type includes Demographic and Neural Clustering; thus, scoring functions for Clustering include algorithms for demographic and neural clustering. Using IM for Data to produce models For all the mining functions that are supported, except Logistic Regression, you can build and store the models by using IM for Data, which supports PMML models. A model must then be exported to an external file. To use the IM Scoring mining functions: v Import the mining model into a DB2 table, where it is stored as a large object v Apply the model to data stored in DB2 tables v Store scoring results in a DB2 table v Extract information about the results, for example, the cluster ID and score 8 Administration and Programming for DB2
  • 24. Figure 1 shows the process by which a mining model that was built with IM for Data is exported from IM for Data, imported into a DB2 database, and applied to selected data. Figure 1. The IM Scoring process Using IM Modeling to produce models You can use IM Modeling to create models for the mining functions that it supports; these are Demographic Clustering and Tree Classification. The models that IM Modeling creates reside in a DB2 table. These models are in a format that enables IM Scoring to apply them directly. Scoring data types, methods, and functions The database objects supplied with IM Scoring consist of the following: v User-defined data types (UDTs) v User-defined functions (UDFs) v User-defined methods (UDMs) Chapter 2. Introducing IM Scoring 9
  • 25. These database objects are grouped together in the schema IDMMX. To access a UDT, UDF, or UDM, you must specify its fully-qualified name, for example, data type IDMMX.DM_ClusteringModel. Part 2, “Reference” on page 73 supplies overview lists and full descriptions of all the database objects supplied with IM Scoring. User-defined data types The user-defined data types are used for identifying and storing mining models and results in DB2 tables. User-defined data types are also referred to as user-defined types or UDTs. The user-defined data types provided by IM Scoring consist of distinct types and structured types. Distinct types The following user-defined types are distinct types in IM Scoring: v DM_ApplicationData v DM_ClasModel, DM_ClusteringModel, DM_RegressionModel v DM_ClasResult, DM_RegResult, DM_ClusResult Structured type The following user-defined type is a structured type in IM Scoring: DM_LogicalDataSpec User-defined methods Use user-defined methods to create or modify user-defined structured types. You can call the methods that are defined for a type by using either a method syntax or a function syntax. Method syntax To call, or invoke, a method using the method syntax: v In an appropriate context, specify the method name preceded by both a reference to a structured type instance, and the double dot operator. v Follow this with the list of arguments enclosed in parentheses. Example: select IDMMX.DM_getClusMdlSpec(modelcolumn)..DM_getNumFields()... Function syntax To call, or invoke, a method using the function syntax: In an appropriate context, specify the method name followed by, in parentheses, the structured type instance and the list of arguments. Example: select IDMMX.DM_getNumFields( IDMMX.DM_getClusMdlSpec(modelcolumn) )... 10 Administration and Programming for DB2
  • 26. If the structured type instance is NULL, the method is not called, and NULL is returned. User-defined functions IM Scoring provides scoring functions, also referred to as user-defined functions (UDFs), which enable you to: v Import and export mining models, and access the properties of the models. v Apply these models to data held in DB2 tables. v Retrieve the results. Function syntax The function syntax is described in ’Function syntax’ in “User-defined methods” on page 10. PMML: A markup language for data mining PMML is a standard format for data mining models. Based on XML, the PMML format provides a standard that enables data mining models to be shared between the applications of different vendors. The intention is to provide a vendor-independent method for defining models. In this way, proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. You can find more information on PMML on the Web site of the Data Mining Group (DMG) at http://www.dmg.org. Converting models IM Scoring provides a model conversion facility, which converts mining models from IM for Data format to PMML 2.0 format. The model conversion facility respects the current server locale and writes the appropriate XML encoding into the PMML model. Additionally, IM Scoring provides the features that are required to register the model conversion facility with IM for Data by using the client tool registration facility of IM for Data. You can use the model conversion facility by selecting the PMML format in the Export dialog of the IM for Data GUI when you export the model. If you import models created by IM for Data into DB2, you do not need to convert the models to PMML 2.0. The model import functions read models in PMML 1.1, PMML 2.0, or Intelligent Miner format. Importing V6 models works only in fenced mode; for further details, see “Importing models in unfenced mode” on page 169. Chapter 2. Introducing IM Scoring 11
  • 27. Online scoring with IM Scoring Java Beans IM Scoring Java Beans can be used to score single or multiple data records using a specified mining model. IM Scoring Java Beans is designed to be used for applications where the online scoring of data records is the main task. A possible application area of IM Scoring Java Beans might be the realization of an Internet-based call center scenario. In this scenario, the required business logic – in this case the scoring functions – runs on a Web or application server. Clients can connect to the server and send to it a data record that was specified by a call-center operator by means of a user interface on the client. The data record is scored on the server, and the result is passed back to the client in real time. Figure 2 shows a simplified design, illustrating how such a scenario could be realized using IM Scoring Java Beans. Here, IM Scoring Java Beans is integrated into a J2EE implementation using, for example, servlets or Enterprise Java™ Beans. Figure 2. Architecture sample to realize a call-center scenario Note: To get optimum performance throughput, you might decide to run each mining model in a separate process. In this case, you would pass only the new records to the appropriate scoring process. This results in a considerable performance improvement. The reason for the improvement is that the model-loading step, which is very time-consuming, is done only once. What is new in version 8.1 This section introduces you to the new features in IM Scoring. 12 Administration and Programming for DB2
  • 28. Ease of use idmmkSQL This new command enables you to generate a sample SQL script from a PMML model. You can then use this sample script as a template to invoke IM Scoring on the model. Improved samples The IM Scoring samples have been enhanced and reworked, and they demonstrate how to use the new DB2 built-in function REC2XML. This simplifies SQL statements and improves performance. E-business enhancements Java support for Realtime Scoring The new JAVA interface, IM Scoring Java Beans, enables you to integrate real-time scoring into e-business applications, for example, those used in CRM. Functional enhancements Model compression Models are now compressed when they are imported into the database. This results in reduced resource consumption (database size) and improved performance. Models that were imported by means of IM Scoring V7 can be compressed through the use of export and import functions. For details, see “Exporting and importing models with the use of compression” on page 168. New methods to work with mining fields The new structured type DM_LogicalDataSpec contains information about the mining fields that are part of the input data used to apply models. This information includes the field name and field type definitions of the mining fields. A number of new methods are supported for DM_LogicalDataSpec: for details, see “Methods provided by IM Scoring” on page 77. Additional functions Additional functions have been added to extract properties from a data mining model and from a scoring result. Standards conformance PMML 2.0 support The IM Scoring conversion utilities now generate PMML 2.0 models. For more information about PMML, see http://www.dmg.org. IM Scoring now accepts PMML 2.0 models in addition to the PMML 1.1 models generated by IM Scoring V7.1. For detailed information about how IM Scoring conforms to PMML, see Appendix G, “IM Scoring conformance to PMML” on page 203. Chapter 2. Introducing IM Scoring 13
  • 29. Platform support There is now support for Windows XP. Shared infrastructure with IM Modeling IM Scoring shares common infrastructure like XML parsing, error handling, tracing, licensing, and diagnostics with IM Modeling. This causes some changes in administrative interfaces. Installation directory IM Scoring uses a new default installation directory prefix, IMinerX, instead of IMinerSc as used in IM Scoring V7.1. The utilities idmenabledb, idmdisabledb, and idmcheckdb The commands to enable and disable a database for IM Scoring have been improved, and are shared between IM Scoring and IM Modeling. The idmcheckdb utility is a new tool that checks the enablement status of a database. Collecting diagnostic information The tracing infrastructure has been improved. New environment variables enable you to customize the degree of tracing information. A new tool, idmlevel, enables you to check which version of IM Scoring you are using. License use management IM Scoring uses nodelock keys to check whether a valid license is available. The ’Try and Buy’ version installs a temporary key. This key allows you to use the product for a limited period of time in accordance with the EULA valid for the ’Try and Buy’ version. The new command idmlicm lets you check the license status. Limitations The following limitations exist in version 8.1. v Importing models from IM for Data Models in Intelligent Miner format It is no longer possible to import these models in unfenced mode. To import models of this kind, do one of the following: v Enable the database in fenced mode. v Use the model conversion utility idmxmod to convert the model to PMML before importing it. v Neural PMML models generated by IM Scoring V7.1 You might have existing models that were generated using the neural kernels of IM for Data V6 or higher in an IM Scoring database. Models of this kind must be migrated by importing them again. 14 Administration and Programming for DB2
  • 30. The cluster position in the PMML file The function DM_getClusterID returns the position of the cluster in the PMML file. This is different from the behavior in IM Scoring V7.1. For details, see “Using the function DM_getClusterID” on page 170. Chapter 2. Introducing IM Scoring 15
  • 31. 16 Administration and Programming for DB2
  • 32. Chapter 3. Data mining functions This chapter provides a general introduction to the data mining functions that can be used with IM Scoring. The generation and application of mining models is described. Note that IM Scoring supports only the application of these models. The mining functions are described in the following sections: v “Classification” v “Clustering” v “Regression/Prediction” on page 18 Classification Classification is the process of automatically creating a model of classes from a set of records that contain class labels. The classification technique analyzes records that are already known to belong to a certain class, and creates a profile for a member of that class from the common characteristics of the records. You can then use a data mining application tool to apply this model to new records, that is, records that have not yet been classified. This enables you to predict if the new records belong to that particular class. When a model is applied, IM Scoring assigns a class label and a confidence value to each individual record being scored. Clustering The clustering technique consists of a range of algorithms that group data records on the basis of how similar they are. For example, a data record might consist of a description of customers. In this case, clustering would group similar customers together, and at the same time it would maximize the differences between the different customer groups formed in this way. The groups that are found are known as clusters. Each cluster tells a specific story about customer identity or behavior, for example, about their demographic background, or about their preferred products or product combinations. In this way, customers can be grouped in homogeneous groups that are very similar to each other. When a model is applied, IM Scoring assigns a cluster ID, a cluster score, a quality value, and a confidence value to each individual record being scored. The cluster score, quality value, and confidence value are different measures that indicate how well the record fits into the assigned cluster. © Copyright IBM Corp. 2001, 2002 17
  • 33. Regression/Prediction The purpose of predicting values is to discover the dependency and the variation of one field’s value on the values of the other fields within the same record. A model is generated that can predict a value for that particular field in a new record of the same form, based on other field values. For example, a retailer wants to use historical data to estimate the sales revenue for a new customer. A mining run on this historical data creates a model. This model can be used to predict the expected sales revenue for a new customer, based on the new customer’s data. The model might also show that, for some customers, incentive campaigns improve sales. In addition, it might reveal that frequent visits by sales representatives lead to a lower revenue if the customer is young. When a model is applied, IM Scoring assigns a predicted value and, for an RBF model, a region ID to each individual cluster being scored. 18 Administration and Programming for DB2
  • 34. Chapter 4. Getting started The aim of this chapter is to get you up and running quickly in using IM Scoring. v First, there is a quick-start guide. Here, you review the tasks that you need to complete to get started. See “Quick start”. v This is followed by sections that help you to gain confidence in using the IM Scoring mining functions. These sections guide you through a tutorial of practice exercises on sample data. By using the data and scripts provided with the IM Scoring package and the instructions given in these sections, you can do the following: – Import and store a sample mining model – Apply it to sample data – Obtain results – Extract information from a model – Apply models created with IM Modeling – Use IM Scoring Java Beans to score records All the tasks in the practice exercises are completed by means of sample scripts. The scripts include standard SQL commands, such as INSERT, and scoring functions such as DM_impClusFile. The contents of the scripts are given in this chapter so that you can see how the SQL statements are structured. You can use these sample scripts as a basis for your own scripts. See “Sample components” on page 23 and “Completing the practice exercises” on page 25. Quick start This chapter guides you through the steps necessary to install and configure IM Scoring successfully. It gives you brief hints on what to do, and points you to the appropriate sections in this guide that describe each step in detail. Some steps are mandatory, and some steps are optional. Mandatory steps: 1. Installation 2. Configuration 3. Creating database objects © Copyright IBM Corp. 2001, 2002 19
  • 35. Optional steps: 1. Verifying the installation and configuration 2. Executing sample applications If you have IM Scoring V7.1 installed and configured, first check any migration issues. For further information, see Appendix C, “Migration from IM Scoring V7.1” on page 167. Installation Install IM Scoring by using the usual installation tools. The IM Scoring CD-ROM contains subdirectories for each platform that is supported. To install IM Scoring, insert the CD-ROM into your CD-ROM drive, and change to the appropriate subdirectory. For each platform, different setup programs (Windows), installp images (AIX), or installation scripts (SUN and Linux) are provided. These enable you to install the various components of IM Scoring (Scoring, Conversion, IM Scoring Java Beans). For full instructions on installing all the components of IM Scoring, configuring the database management system, and uninstalling IM Scoring, see: v Appendix A, “Installing IM Scoring” on page 145 v “Installing IM Scoring Java Beans” on page 164 The conversion component and the Scoring component need additional configuration steps before they are ready to use. For information about the mandatory steps needed to configure the conversion component, see “Configuring IM for Data to export PMML models”. For information about the other mandatory steps that you must follow before you can use the Scoring component, see: v “Configuring the database environment” on page 21 v “Creating database objects” on page 22 For information about the optional steps that you can perform for the Scoring component, see: v “Verifying the installation and configuration” on page 22 v “Executing sample applications” on page 22 Configuring IM for Data to export PMML models After you have installed the conversion component on the AIX or SUN Solaris platform, you need to register the conversion utilities. To do this: 1. Add the contents of the file idmcsctr.add to the idmcsctr.dat file of the IM for Data client 20 Administration and Programming for DB2
  • 36. 2. Add the contents of the file idmcsstr.add to the idmcsstr.dat file of IM for Data server On the Windows platform, these steps are done automatically during installation. It must be done manually only if you install IM for Data after you have installed the conversion component. For more information, see “Enabling IM for Data to export PMML or XML models” on page 158 in Appendix A, “Installing IM Scoring”. The information there will help you also if you are running an IM for Data client in a language other than English. Configuring the database environment After you have installed the Scoring component, you need to configure your DB2 instance and the databases that you want to use with IM Scoring. To configure the DB2 instance as a user with SYSADM authority: v On UNIX® platforms, call the idminstfunc script. This is available in the bin directory of your IM Scoring installation. v On all platforms, increase the database manager configuration parameter UDF_MEM_SZ. A recommended value is 60000, which is the highest possible. Syntax db2 update dbm cfg using UDF_MEM_SZ 60000 v On Windows platforms, increase the DB2 registry parameter DB2NTMEMSIZE to a value that matches your UDF_MEM_SZ value. Syntax db2set DB2NTMEMSIZE=APLD:240000000 v Restart the DB2 instance. v For further information, see: – For UNIX systems: “Enabling the DB2 instance on UNIX systems” on page 157 – For Windows systems: “Enabling the DB2 instance on Windows systems” on page 158 – “The idminstfunc command” on page 135 To configure the databases as a user with SYSADM or DBADM authority: 1. If you do not have an existing database, create a database by using the command DB2 CREATE DATABASE <DBNAME>. 2. Increase the database transaction log size LOGFILSIZ. A recommended value is 2000. Syntax db2 update db cfg for <database name> using logfilsiz 2000 Chapter 4. Getting started 21
  • 37. 3. Increase the database parameter APP_CTL_HEAP_SZ. A recommended value is 10000. Syntax db2 update db cfg for <database name> using APP_CTL_HEAP_SZ 10000 4. Increase the database parameter APPLHEAPSZ. A recommended value is 1000. Syntax db2 update db cfg for <database name> using APPLHEAPSZ 1000 Creating database objects The UDTs, UDFs, and UDMs provided with IM Scoring must be created in the databases that you want to use with IM Scoring. To do this, call the idmenabledb command, which is available in the bin directory of your IM Scoring installation. A mandatory parameter to the command is the database name. Some optional parameters are available. If you want to execute the sample applications provided with IM Scoring, call the command by means of the fenced and the tables options. Syntax idmenabledb <database name> fenced tables For more information and a detailed description of idmenabledb, see: v “Enabling databases” on page 41 v “The idmenabledb command” on page 133 Verifying the installation and configuration You can quickly verify your installation and configuration, and make sure that the appropriate database objects have been created. To do so, follow these steps: 1. Call the command idmcheckdb <database name>, which is available in the bin directory of your installation. The command returns the enablement status of the database. 2. Connect to a database that you have enabled. 3. Use the following command: db2 "values( IDMMX.DM_applData(’Test’,4))" 4. The command must return without error. If you get any error messages, check your installation and configuration for completeness. Executing sample applications IM Scoring provides a set of samples to help you to get familiar with the UDFs and UDTs. For descriptions of the samples and instructions on how to use them as practice exercises, see: 22 Administration and Programming for DB2
  • 38. v “Sample components” v “Completing the practice exercises” on page 25 Generating SQL scripts from your own mining models If you already have PMML models available as flat files, you can generate SQL scripts from them by using the idmmkSQL tool. These scripts will contain template SQL statements that import and apply the model. The SQL script contains placeholders that you replace with the names of concrete database objects in order to finally get the executable SQL script. For more information, see: v “Generating SQL statements from models” on page 44 v “The idmmkSQL command” on page 136 You can find a practice exercise in using the idmmkSQL tool at “Using idmmkSQL to work with your own mining models” on page 38. Sample components The IM Scoring package includes sample components consisting of a series of practice exercises in using IM Scoring. This tutorial material enables you to: v Use the Clustering mining function of IM Scoring For an introduction to the Clustering mining function, see Chapter 3, “Data mining functions” on page 17. v Score records using IM Scoring Java Beans For an introduction to IM Scoring Java Beans, see “Online scoring with IM Scoring Java Beans” on page 12. The IM Scoring sample components reside in a samples directory. This directory contains the mining model, data, and scripts that you require to complete the exercises in this chapter. v On the AIX platform, the samples directory is: /usr/lpp/IMinerX/samples/ScoringDB2 v On the Linux and Sun Solaris platforms, the samples directory is: /opt/IMinerX/samples/ScoringDB2 v On the Windows platform, the samples directory is: <install path>samplesScoringDB2 where <install path> is the directory where IM Scoring is installed. You can also use the shortcut IBM DB2 Intelligent Miner Scoring 8.1 —> Scoring - Samples in the program folder. The IM Scoring Java Beans examples are available in samples/ScoringBean. Chapter 4. Getting started 23
  • 39. Table 4 and Table 5 on page 25 list the files that are included in the samples directory and explain the purpose of each. Table 4. Sample components for the Clustering mining function of IM Scoring Sample component Description clusDemoBanking.dat An exported Demographic Clustering model. The model was built from data for a bank’s customers who have a particular type of account. Customers are grouped according to similarities of age, income, number of siblings, gender, and account type. bankingScoring.data A flat file containing records relating to the customers of a bank. This is the data to which you will apply the model. bankingImport.db2 A script that creates the DB2 table BANKING_SCORING, imports the file bankingScoring.data, and inserts the data into the new table. bankingInsert.db2 A script that imports the model, which is stored in the file clusDemoBanking.dat, and then inserts the model into the table IDMMX.ClusterModels. bankingApplyTable1.db2 A script that: 1. Creates a results table 2. Applies the imported Clustering model to the specified data from the table banking 3. Stores the calculated results in the table 4. Obtains results values from the table bankingApplyTable2.db2 A script that uses nested calls to DM_applData instead of calling the REC2XML function for the purpose of applying the imported Clustering model to the specified data from the table banking. The script: 1. Creates a results table 2. Applies the imported Clustering model to the specified data from the table banking 3. Stores the calculated results in the table 4. Obtains results values from the table bankingApplyView.db2 A script that applies the imported Clustering model to the specified data from the table BANKING_SCORING. The script then obtains values from the calculated results using a common table expression. 24 Administration and Programming for DB2
  • 40. Table 5. Sample components for IM Scoring Java Beans Sample component Description 93er_cars.pmml A Polynomial Regression model Sample93erCars.java A sample Java program readme.txt A README file Note: The script bankingInsert.db2 uses the table IDMMX.ClusterModels. This is one of the sample tables delivered with IM Scoring. Before you perform the tasks described in this chapter, ensure that you have enabled the database by means of the tables option. For instructions on installing the sample tables, see “The idmenabledb command” on page 133. Completing the practice exercises Before you can complete the practice exercises, you must install IM Scoring and configure the system environment. For guidance on the procedure of installing and configuring IM Scoring, see “Quick start” on page 19. The tutorial consists of the following tasks: v Creating a DB2 table and importing data into it v Importing a mining model into a DB2 table v Applying the model to data and obtaining results values, without storing the results in a DB2 table v Applying the model to data, storing the results in a DB2 table, and obtaining results values from the table v Using IM Scoring Java Beans to score records Note: Before starting the tasks, you must be connected to a database that is enabled for the use of IM Scoring. To run the scripts, you must have SELECT and INSERT privileges on the IDMMX.ClusterModels table. Go to the directory where the samples are installed. See “Sample components” on page 23 for information on the directories where the sample files are stored. Creating a table and importing data In this exercise, you create a table and import the banking data. You will later apply the mining model to this data. Chapter 4. Getting started 25
  • 41. First, you must connect to the database. To do this, use the following command: db2 connect to <dbname> To create a table and import the sample data contained in the file bankingScoring.data, run the sample script bankingImport.db2, which is contained in the file bankingScoring.data, by using the following command: db2 -stf bankingImport.db2 Contents of the script bankingImport.db2 CREATE TABLE BANKING_SCORING ( TYPE CHAR(7), GENDER CHAR(6), AGE DOUBLE, PRODUCT CHAR(1), SIBLINGS DOUBLE, INCOME DOUBLE ); IMPORT FROM bankingScoring.data OF DEL INSERT INTO BANKING_SCORING ( TYPE, GENDER, AGE, PRODUCT, SIBLINGS, INCOME ); In the first part of the script, the DB2 table BANKING_SCORING is created, its columns are specified, and data types are defined for each column. In the second part, the flat file bankingScoring.data is imported and inserted into the new table. Data from the flat file populates the columns, which are specified by their names. Importing a mining model In this exercise, you import the sample mining model into the DB2 database and store it in a DB2 table, which has a column configured for mining models. To import the sample mining model, which is stored in the file clusDemoBanking.dat, run the script bankingInsert.db2 by using the following command: db2 -stf bankingInsert.db2 Contents of the script bankingInsert.db2 for AIX 26 Administration and Programming for DB2
  • 42. insert into IDMMX.ClusterModels values ( ’DemoBanking’, IDMMX.DM_impClusFile (’/usr/lpp/IMinerX/samples/ScoringDB2/clusDemoBanking.dat’)); This script uses the function DM_impClusFile, which is specific to IM Scoring, to import the mining model contained in the file clusDemoBanking.dat. The SQL INSERT command inserts the mining model into a column in the table ClusterModels, and sets the name of the model to DemoBanking. The table IDMMX.ClusterModels is configured for the data type DM_ClusteringModel. Note: On Windows, the absolute path is automatically modified at installation time to be consistent with the chosen install path. Applying a model and getting results values You can use different scripts to apply mining models and obtain the results of applying the mining model. In the following exercises, a Demographic Clustering model is used. Using the script 'bankingApplyView.db2' In this exercise, you apply the Demographic Clustering model to the banking data and get the values of the calculated results. To apply the sample mining model and obtain the results of applying the model, run the script bankingApplyView.db2 by using the following command: db2 -stf bankingApplyView.db2 Contents of the script bankingApplyView.db2 WITH clusterView( clusterResult ) AS ( SELECT IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_impApplData( rec2xml( 1.0, ’COLATTVAL’, ’’, B.TYPE, B.AGE, B.SIBLINGS, B.INCOME ) ) ) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’DemoBanking’ ) SELECT IDMMX.DM_getClusterID( clusterResult ), IDMMX.DM_getClusScore( clusterResult ) FROM clusterView ; This script defines a common table expression, clusterView(clusterResult), to hold the results of applying a model. The script then applies the DemoBanking model to selected data from the banking table by using the DM_applyClusModel function. The data values are obtained by means of a call to the DB2 function REC2XML. Chapter 4. Getting started 27
  • 43. Note: The column names specified in the call to REC2XML must exactly match the names of fields that are used in the mining model. For information on how to retrieve the names of the fields in a mining model, see “Querying model field names” on page 49. Finally, the script obtains the cluster ID and the Clustering score from CLUSTER_RESULT by means of the functions DM_getClusterID and DM_getClusScore. Using the script 'bankingApplyTable1.db2' In this exercise, you: 1. Apply the Demographic Clustering model to the banking data. 2. Store the calculated results in a DB2 table. 3. Obtain results values for any customer who is older than 50. To apply the sample mining model, store results, and obtain results values, run the script bankingApplyTable1.db2 by using the following command: db2 -stf bankingApplyTable1.db2 Contents of the script bankingApplyTable1.db2 CREATE TABLE BANKING_APPLY ( TYPE CHAR(7), GENDER CHAR(6), AGE DOUBLE, PRODUCT CHAR(1), SIBLINGS DOUBLE, INCOME DOUBLE, CLUSTER_RESULT IDMMX.DM_ClusResult ); INSERT INTO BANKING_APPLY SELECT B.TYPE, B.GENDER, B.AGE, B.PRODUCT, B.SIBLINGS, B.INCOME, IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_impApplData( rec2xml(1.0, ’COLATTVAL’,’’, B.TYPE, B.AGE, B.SIBLINGS, B.INCOME))) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’DemoBanking’; SELECT AGE, IDMMX.DM_getClusterID( CLUSTER_RESULT ), IDMMX.DM_getClusScore( CLUSTER_RESULT ) FROM BANKING_APPLY WHERE AGE > 50; DROP TABLE BANKING_APPLY; This script creates a DB2 table for the mining results by defining the names and the data types of the columns. The last column, CLUSTER_RESULT, is designated for the results that are calculated. The column is configured for the 28 Administration and Programming for DB2
  • 44. data type DM_ClusResult. The script then applies the DemoBanking model to selected data from the banking table by using the DM_applyClusModel function. Finally, it obtains the cluster ID and the Clustering score from the CLUSTER_RESULT column of the new table by using the functions DM_getClusterID and DM_getClusScore. You can also apply models and compute cluster IDs in a single SQL query. The following example shows an SQL query of this kind: select b.type, b.gender, b.age, b.product, b.siblings, b.income, IDMMX.DM_getClusterID( IDMMX.DM_applyClusModel( c.model, IDMMX.DM_impApplData( REC2XML( 1, ’COLLATVAL’, ’’, b.type, b.age, b.siblings, b.income ) ) ) ) from banking b, IDMMX.ClusterModels c where c.modelname = ’DemoBanking’; Tip: You can use the application functions to define SQL VIEWS that are similar to the output tables created by IM for Data Version 6. The SQL statement would look similar to the template in the following example: CREATE VIEW ApplyOut ( ID, NAME, AGE, ClusterID )AS SELECT I.ID, I.NAME, I.AGE, IDMMX.DM_getClusterID(IDMMX.DM_applyClusModel(c.model, IDMMX.DM_impApplData( REC2XML( 1, ’COLLATVAL’, ’’, ... ) ) ) ) FROM InputTable I, IDMMX.ClusterModels C WHERE C.modelName = ..... Afterwards, you can access the SQL VIEW by using any SELECT statement, such as the following: SELECT ID, NAME, AGE, ClusterID FROM ApplyOut WHERE ClusterID = 3 Using the script 'bankingApplyTable2.db2' The script bankingApplyTable2.db2 has the same functionality as the script bankingApplyTable1.db2, but it uses nested calls to DM_applData instead of a call to REC2XML. For information on the advantages and possible inconveniences of using DM_applData, see “Specifying data by means of DM_applData” on page 52. Alternatively, you can use a CONCAT expression. For further information about this possibility, see “Specifying data by means of CONCAT” on page 53. To apply the sample mining model, store results, and obtain results values, run the script bankingApplyTable2.db2 by using the following command: Chapter 4. Getting started 29
  • 45. db2 -stf bankingApplyTable2.db2 Contents of the script bankingApplyTable2.db2 CREATE TABLE BANKING_APPLY ( TYPE CHAR(7), GENDER CHAR(6), AGE DOUBLE, PRODUCT CHAR(1), SIBLINGS DOUBLE, INCOME DOUBLE, CLUSTER_RESULT IDMMX.DM_ClusResult ); INSERT INTO BANKING_APPLY SELECT B.TYPE, B.GENDER, B.AGE, B.PRODUCT, B.SIBLINGS, B.INCOME, IDMMX.DM_applyClusModel( c.model , IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( ’TYPE’, b.type ), ’AGE’, b.age), ’SIBLINGS’, b.siblings ), ’INCOME’, b.income )) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’DemoBanking’; SELECT AGE, IDMMX.DM_getClusterID( CLUSTER_RESULT ), IDMMX.DM_getClusScore( CLUSTER_RESULT ) FROM BANKING_APPLY WHERE AGE > 50; DROP TABLE BANKING_APPLY; This script creates a DB2 table for the mining results by defining the names and the data types of the columns. The last column, CLUSTER_RESULT, is designated for the calculated results. It is configured for the data type DM_ClusResult. The script then applies the DemoBanking model to selected data from the banking table by using the function DM_applyClusModel. Finally, the script obtains the cluster ID and the Clustering score from the CLUSTER_RESULT column of the new table. To do this, it uses the functions DM_getClusterID and DM_getClusScore. You can also apply models and compute cluster IDs in a single SQL query. The following example shows an SQL query of this kind: select b.type, b.age, b.product, b.siblings, b.income, IDMMX.DM_getClusterID( IDMMX.DM_applyClusModel(c.model , IDMMX.DM_applData( IDMMX.DM_applData( 30 Administration and Programming for DB2
  • 46. IDMMX.DM_applData( IDMMX.DM_applData( ’TYPE’, b.type ), ’AGE’, b.age), ’PRODUCT’, b.product), ’SIBLINGS’, b.siblings ), ’INCOME’, b.income )) from banking b,IDMMX.ClusterModels c where c.modelname=’DemoBanking’; Extracting information from a model In this exercise, you extract information from a model. The model from which you extract information is the one that you inserted into the database as part of the exercise in the section “Importing a mining model” on page 26. The information that you extract is: v The name of the model v The number of clusters v The names of the mining fields To extract the information, run the script bankingExtract.db2 by using the following command: db2 -tf bankingExtract.db2 Contents of the script bankingExtract.db2 WITH MODELCONTENT( CLUSMODELNAME, NOCLUSTERS, MODELFIELDS ) AS ( SELECT IDMMX.DM_getClusMdlName( MODEL ), IDMMX.DM_getNumClusters( MODEL ), IDMMX.DM_getClusMdlSpec( MODEL) FROM IDMMX.ClusterModels WHERE MODELNAME=’DemoBanking’ ) SELECT CLUSMODELNAME, NOCLUSTERS, MODELFIELDS..DM_getFldName(1) AS FIELDNAME1, MODELFIELDS..DM_getFldName(2) AS FIELDNAME2, MODELFIELDS..DM_getFldName(3) AS FIELDNAME3, MODELFIELDS..DM_getFldName(4) AS FIELDNAME4 FROM MODELCONTENT; Applying models created with IM Modeling In this exercise, you apply models created with IM Modeling. A prerequisite for executing these samples is that you have installed and configured IM Modeling and executed the banking samples provided with IM Modeling. Executing the IM Modeling samples before executing the sample Chapter 4. Getting started 31
  • 47. files provided with IM Scoring has a further advantage. It helps you to understand which UDFs and UDMs belong to IM Modeling, which belong to IM Scoring, and which belong to both. To apply the models, run the scripts bankingApplyModeling1.db2 and bankingApplyModeling2.db2 by using the following commands: db2 -tf bankingApplyModeling1.db2 db2 -tf bankingApplyModeling2.db2 In the first set of statements in the scripts, information is extracted from the model. The second set of statements in the scripts applies the models to new data. The difference between the two sets of statements is that the first one uses rec2xml to build the record and the second uses DM_applData. Contents of the script bankingApplyModeling1.db2 WITH MODELCONTENT( CLUSMODELNAME, NOCLUSTERS, MODELFIELDS ) AS ( SELECT IDMMX.DM_getClusMdlName( MODEL ), IDMMX.DM_getNumClusters( MODEL ), IDMMX.DM_getClusMdlSpec( MODEL) FROM IDMMX.ClusterModels WHERE MODELNAME=’BankingClusColumnModel’ ) SELECT CLUSMODELNAME, NOCLUSTERS, MODELFIELDS..DM_getFldName(1) AS FIELDNAME1, MODELFIELDS..DM_getFldName(2) AS FIELDNAME2, MODELFIELDS..DM_getFldName(3) AS FIELDNAME3, MODELFIELDS..DM_getFldName(4) AS FIELDNAME4 FROM MODELCONTENT; WITH clusterView( clusterResult ) AS ( SELECT IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_impApplData( rec2xml( 1, ’COLATTVAL’, ’’, B.TYPE, B.AGE, B.SIBLINGS, B.INCOME ) ) ) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’BankingClusColumnModel’ ) SELECT IDMMX.DM_getClusterID( clusterResult ), IDMMX.DM_getClusScore( clusterResult ) FROM clusterView ; Contents of the script bankingApplyModeling2.db2 WITH MODELCONTENT( CLUSMODELNAME, NOCLUSTERS, MODELFIELDS ) AS ( SELECT IDMMX.DM_getClusMdlName( MODEL ), 32 Administration and Programming for DB2
  • 48. IDMMX.DM_getNumClusters( MODEL ), IDMMX.DM_getClusMdlSpec( MODEL) FROM IDMMX.ClusterModels WHERE MODELNAME=’BankingClusAliasModel’ ) SELECT CLUSMODELNAME, NOCLUSTERS, MODELFIELDS..DM_getFldName(1) AS FIELDNAME1, MODELFIELDS..DM_getFldName(2) AS FIELDNAME2, MODELFIELDS..DM_getFldName(3) AS FIELDNAME3, MODELFIELDS..DM_getFldName(4) AS FIELDNAME4 FROM MODELCONTENT; WITH clusterView( clusterResult ) AS ( SELECT IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( ’N_TYPE’, B.TYPE ), ’N_AGE’, B.AGE), ’N_SIB’, B.SIBLINGS ), ’N_INC’, B.INCOME )) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’BankingClusAliasModel’ ) SELECT IDMMX.DM_getClusterID( clusterResult ), IDMMX.DM_getClusScore( clusterResult ) FROM clusterView ; Using IM Scoring Java Beans to score records In this exercise, you use IM Scoring Java Beans to score records. To do this, you use the sample program Sample93erCars.java, which you can find in the DB2 IM Scoring installation directory under samples/ScoringBean. In the example, the minimum price of a car is predicted, given the basic characteristics for a car. The data used to train and generate the mining model contained a large number of fields, including: Horsepower Engine Size City MPG Highway MPG Passenger capacity Weight (pounds) The training data also contained the actual Minimum Price (in $1000), which was used as the predicted field when the mining model was generated. When IM Scoring Java Beans is used with this model, the scorer predicts the minimum car price for new, previously unseen data. Chapter 4. Getting started 33
  • 49. First, IM Scoring Java Beans is set up to be used for scoring runs. The section of code that follows shows the source code that is used to do this. Here, a constructor is provided that takes as its input parameter the name of the file in which the mining model is stored. In the example, the model file 93er_cars.pmml is located in the same directory as the sample program. The constructor calls the method initModel(String modelFile), where the instance variable scorer is initialized with the new mining model. This initialization can be done by means of the constructor of the RecordScorer class, or by using the setModel(String modelFile) method. This operation can take some time, because the mining model is loaded into memory, parsed, and interpreted for scoring. When this is complete, the initialized scorer is prepared for scoring. Source code: Setting up the RecordScorer public Sample93erCars(String modelFileName ) { initModel( modelFileName ); } /** * Initializes <code>scorer</code> with the specified * mining model, i.e. sets and loads the mining model. */ public void initModel( String modelFileName ) { try { // Sets the new model file and loads the model in // preparation for the scoring runs that will follow. if ( scorer == null ) { scorer = new RecordScorer( modelFileName ); } else scorer.setModelFile( modelFileName ); } catch ( ModelException e ) { e.printStackTrace(); } } When the setup of RecordScorer is complete, it can be used immediately for scoring on a record-by-record basis and for the reading of results. The source code given below in ’Source code: Applying scoring on a record-by-record basis’ shows how this works. The doScoring(Map record) method gets as its input the record that you want to score. The call to scorer.score(record) computes the scoring result. The result is stored as an instance variable in the scorer object, and can be accessed now by means of the getPredictedValue() method. In order to keep the scoring API as simple and small as possible, there are no extra result objects provided for each of the model types. Instead, the RecordScorer provides a set of methods that are used to access the computed result fields. If a method is called that does not suit the actual model type, a ResultException 34 Administration and Programming for DB2
  • 50. is thrown. A call to getClusterID(), for example, instead of to getPredictedValue() in the code given in ’Source code: Applying scoring on a record-by-record basis’ would result in such an exception. Source code: Applying scoring on a record-by-record basis /** * Applies scoring: Gets a record as input, applies scoring * and displays the output. * @param record the record used for scoring */ public void doScoring( Map record ) { if ( scorer != null ) { try { // do scoring now scorer.score( record ); // Before reading the results, validate that the // model type is set to regression. if ( scorer.getModelType() == Scorer.REGRESSION_TYPE ) { double predictedValue = scorer.getPredictedValue(); displayPredictionResult( record, predictedValue, "minimum car price" ); } } catch ( Exception e ) { // an error occurred while scoring the record e.printStackTrace(); } } } The code given in ’Source code: Accessing model metadata’ shows how access is gained to the metadata of the mining model that is used. For a specified model, it is possible to retrieve the active mining fields that are used, as well as the mining types of these fields. A field that is used by the mining model for the computation of the scoring result is known as an active mining field. In the method displayActiveFields(), the active fields and their related mining types are displayed. Source code: Accessing model metadata /** * Displays the active fields of the mining model that was used. * The active fields are the ones that are used by the mining * model for scoring. */ public void displayActiveFields() { if ( scorer != null ) { String[] activeFields = scorer.getFieldNames(); if ( activeFields != null ) { System.out.println( "nActive Fields: " ); Chapter 4. Getting started 35
  • 51. for ( int i=0; i<activeFields.length; i++ ) { System.out.print( "Field Name: " + activeFields[i] ); if ( scorer.isCategoricalField( activeFields[i] ) ) { System.out.println( ", Mining Type: categorical" ); } else { System.out.println( ", Mining Type: numerical" ); } } } } } The code given here in ’Source code: The main program’ shows the main program. Here, the following scenario is demonstrated: v A customer is interested in buying a car, and asks a car vendor for an estimated minimum price for his or her dream car. v The customer incrementally adds the characteristics that the car should have, and wants to know the estimated price for these additional characteristics. The customer starts by asking for the basic features that the car should have. Other specifications follow – the outside dimensions of the car, the engine characteristics, and the driving behavior. The customer finally gives the specifications for the inside of the car. The code given here in ’Source code: The main program’ shows the output of the program. The output demonstrates how the price varies with the new features that the car should have. As can be seen in this example, a record is simply realized as a java.util.Map, where field names are mapped to their actual values. Source code: The main program public static void main( String[] args ) { File file = new File( "93er_cars.pmml" ); Sample93erCars obj = new Sample93erCars( file.getAbsolutePath() ); HashMap record = new HashMap(); // A customer starts with the specification of some basic features // that the car should have. The engine should be powerful, with // 200 horsepower. The car should have good safety features, // so that 2 airbags should be available. // As well, it should use only a moderate amount of gas when // being driven in the city. // The estimated minimum price is predicted. record.put( "Horsepower", new Integer(200) ); record.put( "Air Bags standard", new Integer(2) ); record.put( "City MPG", new Integer(17) ); carPricePredictor.doScoring( record ); // After hearing the estimated price, the customer gets more // specific, and asks for some more features. 36 Administration and Programming for DB2
  • 52. // The car should not be longer than 185 inches, and should be // at least 68 inches wide. record.put( "Width (inches)", new Integer(68) ); record.put( "Length (inches)", new Integer(185) ); carPricePredictor.doScoring( record ); // The customer now details the requirements for the engine and the // driving behavior, and wants to know the new estimated price. record.put( "Number of cylinders", new Integer(8) ); record.put( "RPM", new Integer(5500) ); record.put( "U-turn space (feet)", new Integer(40) ); carPricePredictor.doScoring( record ); // Finally, the customer adds some details about how the inside of // the car should look. // Note: If you look at the model 93er_cars.pmml, you will see that // "Passenger capacity" and "Luggage capacity (cu.feet)" do not // appear at all. This means that these features do not affect // the predicted price. Even if the model does not contain these // fields, they can be specified in the record. record.put( "Rear seat room (inches)", new Integer(28) ); record.put( "Passenger capacity", new Integer(28) ); record.put( "Luggage capacity (cu. feet)", new Integer(30) ); carPricePredictor.doScoring( record ); } To compile the Java program Sample93erCars.java: 1. Set the PATH and CLASSPATH variables as described in “Setting environment variables” on page 59. 2. Change to the sample directory. 3. Type javac Sample93erCars.java. This will generate the class file Sample93erCars.class. To start the program, type java Sample93erCars. This will result in the output shown as follows: Results output for Sample93erCars Active Fields: Field Name: Width (inches), Mining Type: numerical Field Name: U-turn space (feet), Mining Type: numerical Field Name: City MPG, Mining Type: numerical Field Name: Air Bags standard, Mining Type: numerical Field Name: Rear seat room (inches), Mining Type: numerical Field Name: Horsepower, Mining Type: numerical Field Name: RPM, Mining Type: numerical Field Name: Number of cylinders, Mining Type: numerical Field Name: Length (inches), Mining Type: numerical Prediction result: Horsepower: 200 City MPG: 17 Chapter 4. Getting started 37
  • 53. Air Bags standard: 2 ========================================= Minimum Car Price (in $1000): 30.45425764285364 Prediction result: Length (inches): 185 Horsepower: 200 Width (inches): 68 City MPG: 17 Air Bags standard: 2 ========================================= Minimum Car Price (in $1000): 32.57913872719416 Prediction result: Number of cylinders: 8 Length (inches): 185 U-turn space (feet): 40 Horsepower: 200 RPM: 5500 Width (inches): 68 City MPG: 17 Air Bags standard: 2 ========================================= Minimum Car Price (in $1000): 36.47348105441428 Prediction result: Width (inches): 68 U-turn space (feet): 40 City MPG: 17 Passenger capacity: 28 Air Bags standard: 2 Rear seat room (inches): 28 Horsepower: 200 RPM: 5500 Number of cylinders: 8 Luggage capacity (cu. feet): 30 Length (inches): 185 ========================================= Minimum Car Price (in $1000): 36.41880910964151 Using idmmkSQL to work with your own mining models In this exercise, you generate a template SQL script using the command line tool idmmkSQL. You then edit the template SQL script in order to create the final script and execute it. To do this exercise, you must first have successfully completed the steps described in “Creating a table and importing data” on page 25 and “Importing a mining model” on page 26. To generate a template SQL script that contains the necessary SQL statements to perform a scoring run: 1. Execute the following command: 38 Administration and Programming for DB2
  • 54. idmmkSQL /D DB2 clusDemoBanking.pmml clusDemoBanking.DB2 This generates the file clusDemoBanking.DB2. 2. Open the file clusDemoBanking.DB2 with an editor of your choice and perform the following modifications: v Replace ###IDMMX.CLUSTERMODELS### with IDMMX.CLUSTERMODELS This appears twice in the file. v Replace ###ABSOLUTE_PATH### with the absolute path to the file clusDemoBanking.pmml This depends on your operating system and your installation directory. v Replace ###RECORDID### with AGE There is no ID column in this table, and therefore the column AGE is used to identify the records. v Replace ###MODEL### with MODEL v Replace ###TABLENAME### with BANKING_SCORING v Replace ###MODELNAME### with MODELNAME 3. Save the modified file. To run the script clusDemoBanking.DB2, use the following command: db2 -stf clusDemoBanking.db2 The script: 1. Imports the PMML 2.0 model clusDemoBanking.pmml into the table IDMMX.CLUSTERMODELS 2. Performs a scoring run with the input data from the table BANKING_SCORING 3. Writes the results into the table RESULTTABLE You can display the results using the following SQL statement: SELECT * FROM RESULTTABLE The results are the same as with the scripts bankingApplyView.db2, which you executed in a previous exercise. Chapter 4. Getting started 39
  • 55. 40 Administration and Programming for DB2
  • 56. Chapter 5. Using IM Scoring This chapter guides you through the use of IM Scoring to perform data mining tasks within a DB2 database. Use IM Scoring to: v Create the database objects that you need for working with IM Scoring v Work with mining models v Apply mining models v Get results values Creating database objects IM Scoring provides a set of UDTs, UDFs, and UDMs. Before you can use them from SQL statements, they must be created in the database as database objects. This section contains instructions on how to create the database objects necessary for IM Scoring. A subsection is devoted to each of the tasks involved, as follows: v “Enabling databases” v “Disabling databases” on page 42 v “Checking databases” on page 43 Enabling databases To create the database objects that are necessary, you must first enable the database for the use of IM Scoring. To enable the database, use the IM Scoring command idmenabledb, which is in the bin directory of your IM Scoring installation. Example: idmenabledb mydb tables The idmenabledb command gets the database name as an input parameter. It connects to the database, and creates the UDTs, UDFs, and UDMs in the database in the schema IDMMX. To execute idmenabledb, you must have SYSADM or DBADM authority. You must call the idmenabledb command for each database that you want to use with IM Scoring. The idmenabledb command is shared between IM Scoring and IM Modeling. This means that you have to call it only once for each database if you have both products installed. The command detects automatically which products are installed. It also detects which UDTs, UDFs, and UDMs already exist in the database and which ones must be created. This means that, to migrate © Copyright IBM Corp. 2001, 2002 41
  • 57. from IM Scoring V7.1 to IM Scoring V8.1, you must call idmenabledb on the databases that were enabled for IM Scoring V7.1. If you call idmenabledb and the only parameter that you specify is the database name, all the UDFs and UDMs are created as fenced. This means that they run in a process separate to the DB2 server process. All model UDTs are created for a maximum size of 10 MB. Additional options allow you to change these default values. If you want to execute the samples that are provided with IM Scoring, you must enable the database by the means of the tables option. If you specify the tables option, IM Scoring creates tables that are suitable for storing mining models. You can use these tables also for production. For complete reference information about the idmenabledb command, see “The idmenabledb command” on page 133. Disabling databases The IM Scoring command idmdisabledb drops from the database all the IM Scoring UDTs, UDFs, and UDMs that were created. Call this command if you want to discontinue using a database with IM Scoring, or before you uninstall IM Scoring. The idmdisabledb command gets the database name as its input parameter. It connects to the database and drops all the UDTs, UDFs, and UDMs that were created when the database and the schema IDMMX were enabled. To execute idmdisabledb, you must have SYSADM or DBADM authority. Database objects cannot be dropped until they no longer have a dependency on other database objects. Take the following situation, for example. You might have created tables using IM Scoring UDTs as column types, or you might have created triggers using some IM Scoring UDFs or UDMs. In this case, you must drop these database objects first before you call idmdisabledb. It might be the case that the only dependent database objects that you have are the tables created by means of the tables option in idmenabledb. In this case, these tables are dropped if you call idmdisabledb with the optional tables option. The idmdisabledb command is shared between IM Modeling and IM Scoring. This means that, if you have both products installed, you have to call the command only once for each database. If both products are installed, it is not possible to disable a database only for the use of one product. If a database was enabled for both products, the database should be disabled before any product is uninstalled. If one product is uninstalled, the database can no longer be disabled for either product. 42 Administration and Programming for DB2
  • 58. Checking databases The IM Scoring command idmcheckdb enables you to check whether a database is enabled or not. The idmcheckdb command gets the database name as input. It connects to the database and returns a message saying whether the database is already enabled for IM Scoring, IM Modeling, or both. Example: idmcheckdb mydb The database "mydb" is enabled for IM Modeling and IM Scoring in "fenced" mode. Working with mining models To use IM Scoring, you need mining models in one of the following formats: v PMML 1.1 or PMML 2.0 format as a file v Intelligent Miner format as a file v PMML 1.1 or PMML 2.0 format in a database table as a column value of type CLOB This section contains instructions on working with mining models. A subsection is devoted to each of the tasks involved, as follows: v “Exporting models from IM for Data” v “Converting exported models” on page 44 v “Generating SQL statements from models” on page 44 v “Importing mining models” on page 45 v “Providing models by means of IM Modeling” on page 48 Exporting models from IM for Data IM for Data stores models as results objects within an internal structure, and allows you to export these objects to an external file. The external file can be accessed by the IM Scoring import functions, but the internally stored results object cannot. A prerequisite is that the conversion component is installed and configured. To export a results object, follow these steps: 1. Click the results object on the IM for Data GUI. 2. From the Selected menu, click Export. 3. Select one of the following formats: v Intelligent Miner format v PMML Chapter 5. Using IM Scoring 43
  • 59. v XML Note: The PMML and the XML format are included in the Export menu only if you have registered the model conversion facility by means of the client tool registration of IM for Data. Refer to the installation chapter for your platform, and see “Enabling IM for Data to export PMML or XML models” on page 158. If you export a model in PMML or XML format, an XML encoding is written into the file. What this encoding is depends on the language environment. The XML encoding is determined by the locale of the IM for Data server. The IM for Data client might reside on a processor that is different from the IM for Data server. In this case, the locale of the IM for Data client might differ from the locale of the IM for Data server. Because IM for Data exports the models to the IM for Data client, the XML encoding might become incorrect. For these client/server and multilanguage environments, the recommended approach is as follows: 1. Export the IM for Data model in Intelligent Miner format. 2. Transfer the model to the processor on which IM Scoring is installed. Converting exported models You can convert mining models in Intelligent Miner format to PMML 2.0 format by using the command line tool idmxmod. The input model in Intelligent Miner format must exist as a flat file. The output model in PMML format is also contained in a flat file. For instructions on using this command line tool, see “The idmxmod command” on page 139. Generating SQL statements from models It can be time-consuming to manually write the SQL scripts that contain the necessary SQL statements for applying a model to a set of input records. This is especially the case if there are a lot of mining fields. For this reason, IM Scoring provides a separate tool, the idmmkSQL command, that generates from mining models the SQL statements that you need. The tool: 1. Analyzes an input file containing a PMML model 2. Writes a template SQL script that can be used to do these tasks: v Import a model v Build the input data records v Perform the application functions v Store the most important results of the scoring run in a table 44 Administration and Programming for DB2
  • 60. The template script contains placeholders for actual values like table names or file names. You have to manually insert these values into the template in order to get the final SQL script. The idmmkSQL command supports the different variants of building input data records for the application functions that are provided by IM Scoring. The tool has a command line interface. The type of SQL and the method of building input data records can be controlled by means of a number of command line parameters. For instructions on using this tool, see “The idmmkSQL command” on page 136. Importing mining models The IM Scoring functions for importing mining models read a file that contains the mining model or a CLOB value from a database table that contains a PMML model. Each import function returns the mining model as data of one of the data types specific to IM Scoring. You can use an SQL command to insert the returned data into a DB2 table where a column has been configured for the appropriate data type. The IM Scoring package contains sample tables that are configured for the imported models. For instructions on installing these sample tables, see “The idmenabledb command” on page 133. Errors can occur in the following circumstances: v A model is imported by means of the wrong import function. For example, the function DM_impClusFile was used to import a Classification model. v The wrong type of data is inserted into a table. For example, the data type DM_ClusteringModel was inserted into the RegressionModels table. v A table that does not exist was specified. If an error occurs when you are using one of the sample table names, ensure that you have enabled the database by means of the tables option. For instructions on installing the sample tables, see “The idmenabledb command” on page 133. Importing mining models from a file The import functions import a mining model from a file and return it as a value of the appropriate data type. Table 6 on page 46 shows the relationship between the import function, the returned data type, and the sample table. The encoding of characters in the imported file is determined by default. For XML files, the XML encoding specification in the XML header is respected. If the encoding specification does not exist, UTF8 is assumed. For more information on encodings, see “Using IM Scoring in a multilanguage Chapter 5. Using IM Scoring 45
  • 61. environment” on page 65. Table 6. Import functions and related data types and tables Import function Data type Table DM_impClusFile DM_ClusteringModel ClusterModels DM_impClasFile DM_ClasModel ClassifModels DM_impRegFile DM_RegressionModel RegressionModels Figure 3 illustrates the relationship between model type, scoring functions, and the data type of the column in the DB2 table. Figure 3. Model import processes You can use the following command from the command line to import the file myclusters.x and insert it into the ClusterModels table in your schema: db2 insert into IDMMX.ClusterModels values (’CustomerSegments’, IDMMX.DM_impClusFile(’/tmp/myclusters.x’)) 46 Administration and Programming for DB2
  • 62. Importing mining models from a file using a specific XML encoding With IM Scoring’s import functions, you can override the encoding specification given in the XML file and explicitly specify the encoding to be assumed for the file to be imported. IM Scoring’s import functions use a specific XML encoding to import a mining model from a file and return it as a value of the appropriate data type. The model is imported in the specified XML encoding. Specifying an XML encoding might be necessary, for example, in a situation where both of the following are true: v The model was exported from IM for Data in PMML or XML format. v The locales of the Intelligent Miner server and the Intelligent Miner client were different. Table 7 shows the relationship between the import function, the returned data type, and the sample table. The encoding of characters in the imported file is determined by default. For more information on encodings, see “Using IM Scoring in a multilanguage environment” on page 65. Table 7. Import functions using a specific XML encoding Import function Data type Table DM_impClusFileE DM_ClusteringModel ClusterModels DM_impClasFileE DM_ClasModel ClassifModels DM_impRegFileE DM_RegressionModel RegressionModels You can use the following command from the command line to import the file myclusters.x and insert it into the ClusterModels table in your schema: db2 insert into IDMMX.ClusterModels values( ’CustomerSegments’, IDMMX.DM_impClusFileE(’/tmp/myclusters.x’, ’iso-8859-1’ ) Importing mining models from a CLOB value in a database table You can use import functions that import a PMML 1.1 or PMML 2.0 model that already resides in a database table column as a CLOB value. Note: Use these functions to get a value of one of the model data types from a CLOB value. Do not use the casting functions that are provided by DB2. Table 8 on page 48 shows the relationship between the import function, the returned data type, and the sample table. Chapter 5. Using IM Scoring 47
  • 63. Table 8. Import functions using CLOB values Import function Data type Table DM_impClusModel DM_ClusteringModel ClusterModels DM_impClasModel DM_ClasModel ClassifModels DM_impRegModel DM_RegressionModel RegressionModels The following is an example of the use of these import functions. 1. A model name and a model in PMML 1.1 or PMML 2.0 format as a CLOB value are selected from a table, PMMLClusterModels. 2. They are then inserted into the ClusterModels table. 3. The PMML model is converted from the CLOB data type to the DM_ClusteringModel data type. db2 insert into IDMMX.ClusterModels select modelname, IDMMX.DM_impClusModel( model) from PMMLClusterModels Providing models by means of IM Modeling IM Modeling provides an SQL interface consisting of a set of database objects. These objects enable you to build data mining models in PMML 2.0 format from information held in IBM DB2 databases. IM Modeling writes the data mining models that it creates into tables, and these models can be directly used by IM Scoring. IM Modeling supports the following subset of the IM Scoring model types: DM_ClusteringModel DM_ClasModel The samples provided in the samples/ScoringDB2 directory show you how to extract from a model the information needed to apply that model. The most important part of this information consists of details about the active fields that are contained in the model. Values must be provided for these fields when the model is being applied. The samples also apply the models. The names of the samples are bankingModelingApply1.db2 and bankingModelingApply2.db2. For instructions on how to execute these samples, see “Applying a model and getting results values” on page 27. 48 Administration and Programming for DB2
  • 64. Applying mining models This section contains instructions on applying mining models. It contains the following subsections: v “Querying model field names” v “Using the application functions” on page 50 v “Specifying data by means of REC2XML” on page 51 v “Specifying data by means of DM_applData” on page 52 v “Specifying data by means of CONCAT” on page 53 v “Results data” on page 53 v “Code sample for applying models” on page 55 Querying model field names It is important to get the best possible result when you apply a model to new data records. To ensure this, provide a value in a data record for each active field in the model. The data record consists of a set of field names and their values. The field names in the set match the names of the active fields in the model. If necessary, you can access information about the active fields in your model. For a sample script that does this, see “Extracting information from a model” on page 31. To get the set of active fields and their data mining field type, use one of the following UDFs: v DM_getClusMdlSpec for Clustering models of type DM_ClusteringModel v DM_getClasMdlSpec for Classification models of type DM_ClasModel v DM_getRegMdlSpec for Regression models of type DM_RegressionModel The functions get the model as input parameter, and return a value of type DM_LogicalDataSpec. DM_LogicalDataSpec is a structured type. Methods (UDMs) are available that enable you to query the content of the DM_LogicalDataSpec value. They are: v DM_getNumFields to get the total number of fields in the DM_LogicalDataSpec value v DM_getFldName to get the name of a field v DM_getFldType to get the mining field type of a field DM_getFldName and DM_getFldType get a position as input parameter. The position value ranges from 1 to the result of a call to DM_getNumFields(). To determine the field names contained in the PMML model, the following logic is used: Chapter 5. Using IM Scoring 49
  • 65. 1. The attribute displayName of the element DataField might contain a field name. If this is the case, this field name is returned if the field is marked as an active field in the element MiningField. Otherwise, it is not returned. 2. Otherwise, the field name that is contained in the name attribute of the element DataField is returned. This happens if the field is marked as an active field in the element MiningField. If the field is not marked as an active field, it is not returned. Using the application functions IM Scoring includes functions that apply imported mining models to selected data. Table 9 lists each function together with a brief summary of how it is used. Table 9. Functions for applying models Application function Purpose DM_applyClusModel Applies a model to selected data to produce results data, grouped into clusters. Clusters are identified when the model is built. They identify similar characteristics in data. A possible use of this function is in producing targeted mailshots. DM_applyClasModel Applies a model to selected data to produce results data that is classified according to rules established when the model is built. A possible use of this function is in classifying insurance risks. DM_applyRegModel Applies a model to selected data to calculate a predicted value. The predicted value is based on the values of input fields, according to a pattern established when the model is built. A possible use of this function is in ranking customers. If the models were created by means of IM for Data, refer to the appropriate chapters in Using the Intelligent Miner for Data for more information about the way each type of modeling works. If the models were created by means of IM Modeling, refer to the appropriate chapters in IM Modeling Administration and Programming. This provides more information about the way the Classification and Clustering modeling types work. Each function has the following input arguments: 50 Administration and Programming for DB2
  • 66. An imported mining model of the appropriate type The mining models are assumed to reside in a table. One column of the table contains the identifier or the name of the model. The other column contains the model itself. In this case, the WHERE clause in the SELECT statement determines which model is the input to the application function. If the WHERE clause fails to return unique results, an error occurs. You can apply only one model for each statement. The data to which the model is applied You can use the function DM_applData or DM_impApplData to define the data to which you want to apply a model. These functions construct an instance of the data type DM_ApplicationData. To select several data items, you can choose one of the following options: v Call the function DM_impApplData with a concatenation of data items as the input. v Call DM_impApplData with the output of the REC2XML function. v Make several nested calls to the function DM_applData. Each nested call appends data to an instance of DM_ApplicationData. Note: Use the function DM_applData or DM_impApplData to get a value of data type DM_ApplicationData. Do not use the casting functions that are provided by DB2. The function returns results data that contains the values that the mining logic calculates. The field names contained in the DM_ApplicationData value are mapped to the fields using the following logic: 1. A field name is compared with the field names in the PMML model that are contained in the attribute displayName in the element DataField. 2. If there is no match, a field name is compared with the field names in the PMML model that are contained in the name attributes in the DataField elements. If you created the model by using IM for Data, the attribute displayName is empty, and the attribute name contains the name of the field used in IM for Data. Specifying data by means of REC2XML Use the DB2 built-in function REC2XML for performance-critical applications to build a data record. REC2XML gets a number of control parameters and a list of columns as input. The output is an XML string containing pairs consisting of column names and values. This XML string can be used as the input to the Chapter 5. Using IM Scoring 51
  • 67. DM_impApplData function. This function returns the XML string as data type DM_ApplicationData, which is the correct data type to be used as input parameter to the application functions. You might want to use REC2XML to build a data record for the application functions. In this case, the field names of the active fields in the model must match the columns of the table that is used for REC2XML. If this is not the case, you have the following possibilities: 1. Create a view on the table in order to have the same column names as field names in the model. 2. Use the CONCAT function instead of REC2XML to build the XML string. This technique allows you to map the column names to the field names in the model. By using CONCAT, you achieve performance similar to that achieved when you use REC2XML. 3. Use the function DM_applData instead of REC2XML. DM_applData is slower, but it is easier to use than CONCAT. DM_applData also allows you to map the field names in the model to the column names in the table. REC2XML is described in the DB2 V7.2 SQL Reference. For your convenience, that description is also given here in Appendix F, “The DB2 REC2XML function” on page 199. For an example of its use, see “Applying a model and getting results values” on page 27. Specifying data by means of DM_applData Use the function DM_applData to build a data record where both of the following are true: v Performance is not important. v The column names for the data record are different from the field names in the model. You must call DM_applData for each column that goes into the data record. The function returns an XML string consisting of a field name/value pair of type DM_ApplicationData. DM_applData is called in a nested way, because the return value of the function is the input to the next call of the function. The function then appends its field name/value pair to the input XML string, and returns the whole XML string. The input parameters to the function DM_applData are as follows: v The value of type DM_ApplicationData v A field name that matches a field name in the model v A value 52 Administration and Programming for DB2
  • 68. If you use DM_applData on a table, the values are specified using the column name. Specifying data by means of CONCAT Use the built-in function REC2XML for performance-critical applications. You can also use the built-in operator CONCAT, or the corresponding characters ||, in a sequence to construct the type DM_ApplicationData. This is the input for the application functions. For more information on the required format of the string, see “DM_impApplData” on page 120. You can also generate the CONCAT syntax by using the idmmkSQL command, which is described in “The idmmkSQL command” on page 136. For example, the following calls correspond to each other: IDMMX.DM_applData( IDMMX.DM_applData( ’CHARCOL’, CHARCOL ) , ’DOUBLECOL’, DOUBLECOL ) IDMMX.DM_impApplData(’<row><column name="CHARCOL">’ ||CHARCOL||’</column> ’||’column name="DOUBLECOL">’ ||CHAR(DOUBLECOL)||’</column></row>’) If the columns CHARCOL and DOUBLECOL have NULL values, you must call CONCAT as follows: IDMMX.DM_impApplData(’<row><column name="CHARCOL"’ ||coalesce(’>’||CHARCOL,’ null="true">’ )||’</column>’||’<column name="DOUBLECOL"’ ||coalesce(’>’||CHAR(DOUBLECOL),’ null="true">’ )||’</column></row>’) Note: IM for Data handles date values and time values in a character ISO format. You might want to specify a date value or a time value as CONCAT operator, and you might want to apply a model created by IM for Data. In this case, the date value and the time value must be converted to character ISO format. To do this, use CHAR(<value>, ISO) for a date value and CHAR(<value>, JIS) for a time value. Results data The results data from each of the scoring functions is identified by a data type that is specific to IM Scoring. Table 10 lists the application functions together with their data types and results data. Table 10. Application functions and their data types and results data Application function Data type Results data DM_applyClusModel DM_ClusResult Cluster ID, score, quality, confidence DM_applyClasModel DM_ClasResult Predicted class, confidence Chapter 5. Using IM Scoring 53
  • 69. Table 10. Application functions and their data types and results data (continued) Application function Data type Results data DM_applyRegModel DM_RegResult Predicted value, region ID (if an RBF model was used) Notes: 1. If you create your own DB2 tables to store results, ensure that you include a column that is configured for the appropriate data type. 2. If you use an RBF model for the DM_applyRegModel function, the results data Region ID is additionally created. Errors can occur if any of the following is the case: v A model is applied using the wrong function. For example, the function DM_applyClusModel is used to apply a Classification model. v The wrong type of results data is inserted into a table. For example, the data type DM_ClusResult is inserted into a column that is configured for data type DM_RegResult. v The fields specified for inclusion in the data do not match the fields included in the model. v A results table that does not exist is specified. Figure 4 on page 55 illustrates the process by which data is selected and a model is applied. Values for the fields age and salary are read from a database table to form an instance of the data type DM_ApplicationData. This data and the model, ClusterModel, form the input to the function DM_applyClusModel. The results of applying the model consist of a cluster ID and a cluster score. The results are returned as data type DM_ClusResult. 54 Administration and Programming for DB2
  • 70. Figure 4. Applying a model to data Code sample for applying models The following code sample shows a statement that applies a Classification model to all records in the table myData where the customer’s age is less than 40. INSERT INTO myClassifResults(name, salary, address, clfresult) SELECT s.name, s.salary, s.address, IDMMX.DM_applyClasModel(c.model, IDMMX.DM_applData(IDMMX.DM_applData(’AGE’, s.age),’SALARY’, s.salary)) FROM ClassifModels c, myData s WHERE c.modelname=’Customers’ and s.age<40 The fields salary and age from the myData table are used. The Classification model and the name of the model are stored in the ClassifModels table in the columns model and modelname. The model name is Customers. The results of applying the model together with customer information are then inserted into the myClassifResults table. The calculated results data is contained in the clfresult column. This column is configured for the data type DM_ClasResult. Chapter 5. Using IM Scoring 55
  • 71. Getting application results IM Scoring includes results functions that obtain the different results values calculated within the application functions. This section lists and describes these functions. It also includes information about situations where input records contain NULL values; this is in “Handling missing values” on page 57. Table 11 lists the results functions and describes the value that each function returns. Table 11. Results functions and their purpose Results function Purpose DM_getClusterID Obtains the cluster ID from results data calculated when a Clustering model is applied. This identifies the position of the cluster in the Clustering model that was the best match for this data record. The position of the cluster is a value between 1 and the number of clusters. DM_getClusScore Obtains the Clustering score from results data calculated when a Clustering model is applied. The score is a measure of how closely the data matches the model cluster. DM_getQuality Returns the quality of the best cluster from results data calculated when a Clustering model is applied. DM_getQuality(clusterid) Returns the quality for a specified cluster from results data calculated when a Clustering model is applied. DM_getClusConf Returns the confidence of attributing a record to the best cluster in comparison with attributing it to another cluster of the applied model. DM_getPredClass Obtains the predicted class from results data calculated when a Classification model is applied. This identifies the class within the model to which the data matched. DM_getConfidence Obtains the classification confidence value from results data calculated when a Classification model is applied. This is a value between 0.0 and 1.0; it measures the probability that the class is predicted correctly. DM_getPredValue Obtains the predicted value from results data calculated when a Regression model is applied. This value is calculated according to relations established by the model. DM_getRBFRegionID Obtains the number of the region from results data calculated when an RBF Regression model is applied. This identifies the region within the model that the record was assigned to. 56 Administration and Programming for DB2
  • 72. To obtain the results of applying a model, you run the appropriate results function, giving the return value of an application function as a parameter. This can be one of the following: v A common table expression v A column in a database table where you have stored the return value of an application function v The return value itself, if you call the functions in a nested way You use common table expressions or pass the return value of the application function directly if you use only a single statement. That is, you apply the model and obtain the results value within a single statement. You use the database column name if the results of applying the model were stored in a DB2 table. If you specify a column name, and the column is not configured for the correct results data type, an error occurs. See Table 10 on page 53 for the relationships between results functions, results data types, and results data. A results function might return a NULL value. This means that the application function that calculated the result got a faulty data record as input. The data record consisted of too many invalid values for a reasonable result to be returned. Handling missing values Input records might contain one or more values that are NULL; these are known as missing values. The handling of missing values in IM Scoring depends on the algorithm. Neural algorithms In general, the Neural algorithms of the mining functions handle missing values as follows: v Numeric variables: If a missing value replacement (PMML 2.0) is present, that will be taken. Otherwise, the activation of the corresponding input node will be set to 0.5, which equals the mean value in most cases. v Categorical variables: If a missing value replacement (PMML 2.0) is present, that will be taken. Otherwise, the activations of all corresponding input nodes will be set to 0. Classification v Neural Classification: In IBM models, if none of the activations of the output neurons is above a certain threshold limit, DM_getPredClass returns NULL. Other models always give a prediction. DM_getConfidence always returns a value. v Tree Classification: The handling of missing values depends on whether the model was generated by an IBM product or by a non-IBM product. Chapter 5. Using IM Scoring 57
  • 73. Models generated by an IBM product These IBM products consist of IM Modeling and IM for Data. With IBM models, a sophisticated value treatment is used. If a missing value occurs, the record being scored is fed into both child nodes (binary tree) of the tree node requiring the missing value. This process continues until the record reaches a leaf node. Thus, a record is assigned to more than one leaf node. Tree Classification aggregates all these leaf nodes, and DM_getPredClass returns the value assigned to this aggregated node. Models generated by a non-IBM product The scoring process stops at the first tree node requiring a missing value, and DM_getPredClass returns the value assigned to this (non-leaf) node. Clustering v Demographic Clustering: Missing values are ignored and the corresponding field does not participate in the scoring process. If all the values of the record are missing, NULL is returned by DM_getClusterID and DM_getClusScore. v Neural Clustering: DM_getClusterID and DM_getClusScore never return NULL. Regression v Linear Regression: – Numeric variables: If a missing value replacement (PMML 2.0) is present, this will be taken. If a mean value is given in the PMML, that will be taken. Otherwise the variable is ignored. – Categorical variables: If a missing value replacement (PMML 2.0) is present, that will be taken. Otherwise, the variable is ignored. – If all input variables are missing values, no prediction will be given. The function DM_getPredValue returns NULL. v Neural Prediction: DM_getPredValue will always give a prediction value. v RBF Prediction: Missing values are ignored, and the corresponding field does not participate in the scoring process. If all the values of the record are missing, DM_getPredValue and DM_getRBFRegionID return NULL. Using IM Scoring Java Beans This section describes the interfaces of IM Scoring Java Beans. It contains the following subsections: v “Setting environment variables” on page 59 v “Specifying the mining model to be used” on page 60 58 Administration and Programming for DB2
  • 74. v “Accessing model metadata” on page 61 v “Specifying a data record” on page 62 v “Applying scoring” on page 62 v “Accessing computed results” on page 62 v “Scoring example” on page 63 v “ScoringException classes” on page 64 To perform the scoring of a data record, you must specify the following input: 1. The mining model to be used 2. One or more data records for which you want to compute a score value When you have specified the necessary input, you can apply scoring and then access the result fields that have been computed. The functions of IM Scoring Java Beans are implemented as methods of class com.ibm.iminer.scoring.RecordScorer. The sections that follow describe how to use IM Scoring Java Beans. These sections consist of: v “Specifying the mining model to be used” on page 60 v “Accessing model metadata” on page 61 v “Specifying a data record” on page 62 v “Applying scoring” on page 62 v “Accessing computed results” on page 62 v “Scoring example” on page 63 v “ScoringException classes” on page 64 Note that the Java API is documented in online documentation (Javadoc) in the directory docScoringBeanindex.html Setting environment variables After you have installed IM Scoring Java Beans, you must set environment variables before you can use it from Java applications. For a Java application to invoke RecordScorer, you must do the following: On Windows systems: set PATH=%PATH%;<install path>bin v set CLASSPATH=%CLASSPATH%,<install path>javaxerces.jar v set CLASSPATH=%CLASSPATH%,<install path>javaidmscore.jar On AIX systems: The exact commands depend on the shell being used. Chapter 5. Using IM Scoring 59
  • 75. Set your LIBPATH to include /usr/lpp/IMinerX/lib Set your CLASSPATH to include /usr/lpp/IMinerX/lib/xerces.jar and /usr/lpp/IMinerX/libidmscore.jar On Linux and Sun Solaris systems: The exact commands depend on the shell being used. Set your LD_LIBRARY_PATH to include /opt/IMinerX/lib Set your CLASSPATH to include /opt/IMinerX/lib/xerces.jar and /opt/IMinerX/libidmscore.jar The xerces.jar package is the XML4J package needed for XML parsing. It is copied into the IMinerX/java directory during the installation of IM Scoring Java Beans. It is possible, however, to use any other implementation of the XML4J specification. Specifying the mining model to be used The RecordScorer class expects the mining model to be stored in a file on the local file system in PMML 2.0 format. You can specify the mining model to be used for scoring in one of two ways, as follows: v By using the constructor One of the two constructors that are provided allow the specification of a file in which the mining model is stored in PMML format. public RecordScorer( String modelFile ) throws ModelException v By using the method interface public void setModelFile( String modelFile ) throws ModelException The ModelException is thrown if either of the following is true: v The specified file cannot be found or accessed successfully v The specified mining model is in an incorrect format or is of an unknown type When a new mining model is set, this model is loaded explicitly into memory. Loading the mining model means reading the PMML file, interpreting the mining model, and preparing everything for scoring. For big mining models, for example, those with a size of more than 50 MB, this operation might take some time. After the mining model has been loaded, the model is kept interpreted in memory. For consecutive scoring calls, the loaded and prepared mining model is used, so that the scoring of a single record needs a response time of less than a second. When a mining model is set, it remains loaded until one of the following happens: 1. A new model is loaded or 2. The garbage collector frees up the actual RecordScorer instance. 60 Administration and Programming for DB2
  • 76. Accessing model metadata The RecordScorer class provides a set of methods that can be used to access the metadata of the mining model that is specified. These are as follows: Table 12. IM Scoring Java Beans methods for accessing model metadata Method Method description public String[] getFieldNames() Returns the names of the mining fields used by the mining model to perform scoring. If no correct model is specified, a String[0] array is returned. public String[] getCategoricalFields() Returns the names of the categorical mining fields used by the mining model to perform scoring. If no correct model is specified, a String[0] array is returned. public String[] getNumericalFields() Returns the names of the numerical mining fields used by the mining model to perform scoring. If no correct model is specified, a String[0] array is returned. public boolean Returns true if the field specified by isCategoricalField(String fieldName) fieldName is a categorical mining field. Otherwise, false is returned. public boolean Returns true if the field specified by isNumericalField(String fieldName) fieldName is a numerical mining field. Otherwise, false is returned. public boolean isField(fieldname) Returns true if the field specified by fieldName is one of the active mining fields. Otherwise, false is returned. public int getModelType() Returns an integer value that identifies the type of the model that was used. For this, the constant CLUSTERING_TYPE, CLASSIFICATION_TYPE, REGRESSION_TYPE, and UNDEFINED_TYPE are defined in the Scorer class. Chapter 5. Using IM Scoring 61
  • 77. Table 12. IM Scoring Java Beans methods for accessing model metadata (continued) Method Method description public int[] getResultIdentifiers() Returns the identifiers of the result fields that are relevant for the model type that is set. For this, there are additional constants defined in the class Scorer: v For CLUSTERING_TYPE, the result fields CLUSTER_ID, CLUSTER_SCORE, and CLUSTER_QUALITY are relevant. v For CLASSIFICATION_TYPE, the result fields PREDICTED_CLASS and CONFIDENCE are relevant. v For REGRESSION_TYPE, the result fields PREDICTED_VALUE and RBF_REGION_ID are relevant. Specifying a data record You can represent a data record by a mapping. Here, the names of the fields used in the records are mapped to their actual values. This is exactly the way in which a record is represented in the RecordScorer class. A java.util.Map object is instantiated, and a mapping between field names and values is defined. For example: HashMap record = new HashMap(); record.put("Horsepower", new Integer(200) ); record.put("Air Bags standard", new Integer(2) ); record.put("City MPG", new Integer(17) ); Note that the values of categorical fields are interpreted as java.util.String, and the values of numerical fields are expected to inherit from java.lang.Number. Applying scoring After the mining model and a data record are specified, the score of the data record can be computed. For this purpose, the RecordScorer class provides the following method: public void score( Map record ) throws RecordException, ModelException Accessing computed results After you have successfully called the score(Map) method, you can access the computed result fields by using one of the following methods: Table 13. IM Scoring Java Beans methods for accessing computed results Method Required model type Default value double getClusterID() Clustering java.lang.Double.NaN 62 Administration and Programming for DB2
  • 78. Table 13. IM Scoring Java Beans methods for accessing computed results (continued) Method Required model type Default value double getClusterScore() Clustering java.lang.Double.NaN double Clustering java.lang.Double.NaN getClusterQuality() String Classification NULL getPredictedClass() double getConfidence() Classification java.lang.Double.NaN double Regression java.lang.Double.NaN getPredictedValue() Depending on the type of mining model that is used, different sets of fields are computed. After you specify a mining model with the setModelFile(String) method, you can query the type of mining model by using the method public int getModelType(). This method returns an integer value that represents one of the following constants: Scorer.CLUSTERING_TYPE Scorer.CLASSIFICATION_TYPE Scorer.REGRESSION_TYPE Scorer.UNDEFINED_TYPE With getModelType(), it is possible to determine which of the methods listed in Table 13 can be used to access the computed result fields. If the model type is, for example, Scorer.CLASSIFICATION_TYPE, the scoring result can be accessed with the methods getPredictedClass() and getConfidence(). The use of the method getPredictedValue() in this context, for example, would throw a ResultException. For that reason, each of these result methods throws a ResultException to indicate that the result field and the actual model type do not fit together. It might be the case that a mining model is specified, but the model is not loaded yet or the score method is not called yet. In this case, the result methods that fit the specified model type return their default values as listed in Table 13. Scoring example 0 try { 1 RecordScorer scorer = new RecordScorer( "93er_cars.pmml" ); 2 int modelType = scorer.getModelType(); 3 double predictedValue = getPredictedValue(); 4 try { 5 int predictedClass = scorer.getPredictedClass(); 6 } catch ( ResultException e ) { 7 // this is an expected exception for illustration purposes 8 } Chapter 5. Using IM Scoring 63
  • 79. 9 HashMap record = new HashMap(); 10 record.put("Horsepower", new Integer(200)); 11 record.put("Air Bags standard", new Integer(2) ); 12 record.put("City MPG", new Integer(17) ); 13 scorer.score( record ); 14 predictedValue = scorer. getPredictedValue(); 15 16 } catch ( Exception e ) { 17 // should not occur if everything is correct 18 } First, a new RecordScorer instance is initialized with a Stepwise Polynomial Regression mining model (line 1). The value of modelType in line 2 thus is equal to the value specified by the constant Scorer.REGRESSION_TYPE. For this type of model, a predicted value is expected for each scored record. When the method getPredictedValue() is called in line 3, the actual predicted value is returned. Since the score(Map) method is not called yet, the default value NaN is returned by the method in line 3. When getPredictedClass() is called in line 5, a ResultException is thrown, because the predicted class for the result field does not fit the model type Scorer.REGRESSION_TYPE. In line 9 to 12, the actual data record is defined. There, the field names are mapped to their actual values. In line 13, score(Map) is called and the scoring result is computed. The result of the call to getPredictedValue() in line 14 is a value not equal to NaN, the default value. ScoringException classes The following exception classes are used by the RecordScorer and its base class Scorer: ModelException Used if an error occurred that was relevant to the mining model that was specified. RecordException Used if an error occurred that was relevant to the data record that was used. ResultException Used if an error occurred that was relevant to the result fields that were computed. 64 Administration and Programming for DB2
  • 80. Chapter 6. Administrative tasks This chapter is a guide to doing a number of administrative tasks connected with IM Scoring. These include: v “Using IM Scoring in a multilanguage environment” v “Getting error information” v “Getting support” on page 66 Using IM Scoring in a multilanguage environment You can use IM Scoring with databases that are defined in any codepage. You might want to create a new database in a multilanguage environment and enable this database for the scoring functions. In this case, you are recommended to create this database using the Unicode character encoding. In a Unicode-enabled database, DB2 uses the following encoding: v For columns with character types: UTF-8 (UCS Transformation Format) v For columns with graphic types: UCS-2 (Universal Character Set coded in 2 octets) You can create a Unicode-enabled database with the following DB2 command: DB2 CREATE DATABASE <dbname> USING CODESET UTF-8 TERRITORY US Getting error information If an error occurs during a scoring run, an error message is displayed. See Appendix E, “Error messages” on page 173 for a list of error messages with explanations of their meanings, and indications about what action you should take. Note that some messages are truncated by DB2. To see the full, untruncated message, check the idmMxError.log file. Errors are logged by default in the file idmMxError.log together with identifiers and time stamps. v On UNIX systems, the file resides in the directory /tmp. v On Windows systems, the file resides in the temp directory, which is either: – The path specified by the TMP environment variable – The path specified by the TEMP environment variable, if TMP is not defined – The Windows directory, if neither TMP or TEMP is defined © Copyright IBM Corp. 2001, 2002 65
  • 81. If you intend to use the environment variable TMP or TEMP, it must be set in the environment of the DB2 engine. Scoring services run as part of the DB2 engine. Set the environment variable as a system variable. When the file gets too large, you must delete it. You can change the error file name and directory by setting the environment variable IDM_MX_ERRORFILE to the name of an alternative error file. The file name must be given as an absolute path name. If you want to prevent IM Scoring from writing to an error file, set the environment variable IDM_MX_ERRORFILE to NONE. The environment variable must be set in the environment of the DB2 engine. On UNIX platforms, add IDM_MX_ERRORFILE to the DB2 registry variable DB2ENVLIST and restart DB2 to activate the changes. See the DB2 Administration Guide to get information on the DB2ENVLIST registry variable. On Windows platforms, IDM_MX_ERRORFILE must be set as a system variable. Getting support Before you contact IBM for support, check the README information that comes with the product. Also check the product’s support pages on the following Web sites: http://www.ibm.com/software/data/support/ http://www.ibm.com/software/data/iminer/modeling/support.html These support pages provide ″Frequently Asked Questions″ and ″Hints and Tips″ that may help to solve your problem. When you contact IBM for support, prepare to answer the questions in the problem identification worksheet. Make yourself familiar with the instructions on how to collect trace information given at “Getting trace information” on page 69. Product README The product README files (readme_sc.txt) can be found in the installation directory, as follows: Windows: <Installdir>IMinerXreadme_sc.txt AIX: /usr/lpp/IMinerX/readme_sc.txt UNIX: /opt/IMinerX/readme_sc.txt 66 Administration and Programming for DB2
  • 82. 'Frequently asked questions' and 'Hints and tips' Check the ″Frequently asked questions″ and ″Hints and tips″ section at the Web site at http://www.ibm.com/software/data/iminer/scoring/support.html Problem identification worksheet The IBM Software Support Guide contains the following worksheet, which is available online at http://techsupport.services.ibm.com/guides/handbook.html Answer the questions in the worksheet before contacting IBM support. PROBLEM IDENTIFICATION WORKSHEET Complete this form before calling Technical Support This form helps you identify problems and assists IBM Technical Support in finding solutions. System Information What is the failing product? _______________________________________________ What is the version and release number?_____________________________________ What machine model, operating system, and version are running?______________ Problem Description What are the expected results?______________________________________________ What statement or command is being used? ___________________________________ What are the exact symptoms and syntax? ____________________________________ ____________________________________________________________________________ What is or isn’t happening, including exact error number and message text? ____________________________________________________________________________ ____________________________________________________________________________ Is anyone else experiencing the problem? ___________________________________ Is this the first time this operation has been attempted? __________________ Is this the first time this problem has occurred?___________________________ Environment When did this activity work last? __________________________________________ Chapter 6. Administrative tasks 67
  • 83. What has changed since the activity last worked? ___________________________ _____ Hardware type/model _____ Application _____ Operating system/version _____ Level of usage _____ New product version/release _____ Maintenance applied If the problem does not occur every time, under what conditions does the problem not occur? ____________________________________________________________________________ ____________________________________________________________________________ Is there any other software running on the system which may be conflicting with this product? ____________________________________________________________________________ ____________________________________________________________________________ Problem Isolation Identify the specific feature of the software causing the problem.__________ ____________________________________________________________________________ Can the problem be reproduced? If so, please provide a reproducible test case or instructions on how to reproduce the error condition.____________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ Getting product information Use the command idmlevel to collect information about the software environment (operating system, DB2 version) and IM product version you are using. Using ’idmlevel’ on Windows operating systems 1. Open a command window. 2. Invoke idmlevel <logfile> Example: idmlevel c:imscoring.log 3. Attach the resulting log file to your problem description when you contact IBM support. Using ’idmlevel’ on UNIX operating systems 68 Administration and Programming for DB2
  • 84. 1. In a command shell invoke: AIX: /usr/lpp/IMinerX/bin/idmlevel /tmp/imscoring.log Other UNIX platforms: /opt/IMinerX/bin/idmlevel /tmp/imscoring.log 2. Attach the resulting log file to your problem description when you contact IBM support. Getting trace information The trace facility in IM Scoring helps you to locate an error by writing information about an error situation to a file. This information consists of trace messages. The trace facility is configured by means of two environment variables. These are as follows: v IDM_MX_TRACEFILE. This specifies the name of the file where trace information is written. If the variable is not defined or is empty, trace is written to the default file idmMxTrace.log in the temporary directory of the system. If the file does not exist, it is created. If the trace file cannot be written, tracing is silently switched off, without an error message or warning. v IDM_MX_TRACELEVEL. This defines the level of tracing. It must contain one of the following values (not case-sensitive): MINIMUM: Tracing is switched off. However, in some severe error situations this setting is ignored, and some trace messages explaining the error are written. This is the default value. BASIC: Coarse trace information (for example, received data, DB2 calls, and so on) is written. MOST: More detailed trace information (for example, call stacks, parameters, and so on) is written. ALL: All available information is printed. Note that this tends to be rather lengthy. IDM_MX_TRACELEVEL determines the filtering of trace messages. IDM_MX_TRACELEVEL is effectively an on/off switch for tracing. Since trace messages are controlled by means of environment variables, concurrent mining runs write to the same trace file. The trace message that is written to the trace file consists of the following information: v The name of the component that issued the trace message v A time stamp Chapter 6. Administrative tasks 69
  • 85. v The trace message itself Additionally, basic system information is written once per trace. Using the tracing facility on UNIX systems To work with the tracing facility, log on to the DB2 server as instance owner and set the environment variable IDM_MX_TRACELEVEL, IDM_MX_TRACEFILE, or both. To switch the tracing facility on, set IDM_MX_TRACELEVEL to BASIC, MOST, or ALL. You can also optionally set IDM_MX_TRACEFILE to a file name including the whole path. If IDM_MX_TRACEFILE is not set explicitly, the trace information is written to the file /tmp/idmMxTrace.log by default. For example, if you use the ksh shell, you might want to use one of the following commands: export IDM_MX_TRACELEVEL=BASIC export IDM_MX_TRACEFILE=/home/db2admin/scoreTrace.log You must add the environment variables to the list of environment variables in the DB2ENVLIST registry variable to enable DB2 to use them. To activate the changes, stop DB2 and restart. To stop tracing, remove the variables from the DB2ENVLIST registry variable and restart DB2. For information on the DB2ENVLIST registry variable, see the DB2 Administration Guide. Note: If you use IM Scoring for AIX on an SP™ with UDB EEE, you are recommended to store the trace file in a local file system, for example, in the default directory /tmp. After calling an application function with the trace option enabled, the idmMxTrace.log file is stored on each node in the /tmp directory. It contains trace information from the processes that run on that node. Using the tracing facility on Windows systems To work with the tracing facility, you must set the environment variable IDM_MX_TRACELEVEL, IDM_MX_TRACEFILE, or both. You must set the environment variables as system variables, because the DB2 engine is implemented as a Windows service. To switch the tracing facility on, set IDM_MX_TRACELEVEL to BASIC, MOST, or ALL. You can also optionally set IDM_MX_TRACEFILE to a file name including the whole path. If IDM_MX_TRACEFILE is not set explicitly, the trace information is written to the file idmMxTrace.log by default. This file resides in the directory that is specified by the TEMP environment variable. 70 Administration and Programming for DB2
  • 86. Follow these steps to set the environment variables: 1. Log on as an administrator. 2. Open the Start menu, and navigate to Settings —> Control Panel —> System —> Environment Variables. This is the procedure on Windows 2000. On other Windows versions, the navigation path may differ slightly. 3. Set the environment variables as required. To activate the changes, restart your system. To stop tracing, remove the environment variables, and restart your system. Getting DB2 diagnostic information If you need to contact IBM support, you might also need to provide DB2 diagnostic information like DB2CLI trace or db2diag.log. For more information, see the DB2 Troubleshooting Guide. Chapter 6. Administrative tasks 71
  • 87. 72 Administration and Programming for DB2
  • 88. Part 2. Reference This part provides a complete reference resource to the database objects that make up IM Scoring. v Reference lists of all the database objects supplied with IM Scoring, together with a short description of each, appear in Chapter 7, “Overview of IM Scoring database objects” on page 75 v Full descriptions of all the IM Scoring methods and functions appear in subsequent chapters, as follows: – Chapter 8, “IM Scoring methods reference” on page 83 – Chapter 9, “IM Scoring functions reference” on page 91 v Descriptions of IM Scoring’s executables to enable and disable a database and to check whether a database has been enabled appear in Chapter 10, “IM Scoring command reference” on page 131. © Copyright IBM Corp. 2001, 2002 73
  • 89. 74 Administration and Programming for DB2
  • 90. Chapter 7. Overview of IM Scoring database objects This chapter provides reference lists of all the database objects supplied with IM Scoring, together with a short description of each. These database objects are: v User-defined data types. See “Data types provided by IM Scoring”. v User-defined methods. See “Methods provided by IM Scoring” on page 77. v User-defined functions. See “Functions provided by IM Scoring” on page 77. Information about the sizes of the input and output parameters to the IM Scoring methods and functions appears in “Parameter sizes” on page 81. For instructions on using these database objects, see Chapter 5, “Using IM Scoring” on page 41. Data types provided by IM Scoring IM Scoring provides a number of user-defined data types. These data types are used to store models and results, and to define the data to which a model is applied. The data types are installed in the schema IDMMX. Table 14. Data types specific to IM Scoring Data type (UDT) Source data type Purpose DM_ApplicationData CLOB Contains the definition of data to which a model is applied. DM_ClasModel BLOB Identifies data within DB2 as a Tree or Neural Classification model. The data type is associated with the model when the model is imported using the DM_impClasFile function. If the model is stored in a database table, the column must be configured for this data type. © Copyright IBM Corp. 2001, 2002 75
  • 91. Table 14. Data types specific to IM Scoring (continued) Data type (UDT) Source data type Purpose DM_ClasResult VARCHAR Contains the predicted class and classification confidence values for a row of data obtained when a Classification model is applied by means of the DM_applyClasModel function. DM_ClusResult VARCHAR Contains the computed cluster ID and the score value for a row of data obtained when a Clustering model is applied by means of the DM_applyClusModel function. DM_ClusteringModel BLOB Identifies data within DB2 as a Demographic or Neural Clustering model. The data type is associated with the model when the model is imported by means of the DM_impClusFile function. If the model is stored in a database table, the column must be configured for this data type. DM_LogicalDataSpec CLOB Identifies the field names and types that are contained in a model and that are needed to apply a model. DM_RegressionModel BLOB Identifies data within DB2 as an RBF or Neural Prediction model or as a Polynomial or Logistic Regression model. The data type is associated with the model when the model is imported by means of the DM_impRegFile function. If the model is stored in a database table, the column must be configured for this data type. 76 Administration and Programming for DB2
  • 92. Table 14. Data types specific to IM Scoring (continued) Data type (UDT) Source data type Purpose DM_RegResult VARCHAR Contains the predicted value for a row of data obtained when a Regression model is applied by means of the DM_applyRegModel function. Note: The maximum size of models stored as BLOB differs depending on whether you selected the fenced or the unfenced option when you enabled the database. By default you can store up to 10 MB in fenced mode and 50 MB in unfenced mode. Methods provided by IM Scoring The methods provided by IM Scoring enable you to work with the structured type DM_LogicalDataSpec. This data type contains the field name and field type definitions of the mining fields that are part of the input data used when models are applied. Table 15. Methods for type DM_LogicalDataSpec See Method Purpose page DM_expDataSpec Exports a DM_LogicalDataSpec value as a CLOB 84 value DM_getFldName Returns the name of a field at a specified position 85 in a value of type DM_LogicalDataSpec DM_getFldType Returns the mining field type of a specified field 86 contained in a value of type DM_LogicalDataSpec DM_getNumFields Returns the number of fields contained in a value 87 of type DM_LogicalDataSpec DM_impDataSpec Imports a previously exported DM_LogicalDataSpec 88 value DM_isCompatible Compares two logical data specifications, and 89 returns TRUE if they are compatible Functions provided by IM Scoring IM Scoring provides a number of user-defined scoring functions that enable you to: 1. Import and export mining models, and access the properties of the models. Chapter 7. Overview of IM Scoring database objects 77
  • 93. 2. Apply these models to data held in DB2 tables. 3. Retrieve the results. The scoring functions are installed in the schema IDMMX. Table 16. Functions for working with scoring data type DM_ApplicationData See Scoring function (UDF) Purpose page DM_applData Obtains the data to which a model is applied by 92 the functions DM_applyClasModel, DM_applyClusModel, and DM_applyRegModel DM_impApplData Converts application input data from the CLOB, 120 VARCHAR, or CHAR format Table 17. Functions for working with data mining model type DM_ClasModel See Scoring function (UDF) Purpose page DM_applyClasModel Applies a specified Classification model to 94 specified data and returns results values DM_expClasModel Returns a character large object representation of 97 a value of type DM_ClasModel DM_getClasCostRate Returns the cost rate contained in a value of 100 type DM_ClasModel DM_getClasMdlName Obtains the name of a Classification model 101 DM_getClasMdlSpec Returns a value of DM_LogicalDataSpec 102 containing the set of fields needed for an application or test of the model DM_getClasTarget Returns the target field contained in a value of 103 type DM_ClasModel DM_impClasFile Imports a Tree or Neural Classification model 121 into a DB2 database DM_impClasFileE Imports a Tree or Neural Classification model 122 into a DB2 database by specifying an encoding DM_impClasModel Converts a Classification model from the CLOB 123 format Table 18. Functions for working with scoring result type DM_ClasResult See Scoring function (UDF) Purpose page DM_getConfidence Obtains the classification confidence value from 110 a results value returned by the function DM_applyClasModel 78 Administration and Programming for DB2
  • 94. Table 18. Functions for working with scoring result type DM_ClasResult (continued) See Scoring function (UDF) Purpose page DM_getPredClass Retrieves the predicted class from a results value 112 returned by the function DM_applyClasModel Table 19. Functions for working with scoring result type DM_ClusResult See Scoring function (UDF) Purpose page DM_getClusConf Returns the confidence of attributing a record to 104 the best cluster in comparison with attributing it to another cluster of the applied model DM_getClusScore Obtains the Clustering score from a results value 107 returned by the function DM_applyClusModel DM_getClusterID Obtains the Clustering ID from a results value 108 returned by the function DM_applyClusModel DM_getQuality Returns the quality of the best cluster 114 DM_getQuality(clusterid) Returns the quality of a specified cluster 115 Table 20. Functions for working with data mining model type DM_ClusteringModel See Scoring function (UDF) Purpose page DM_applyClusModel Applies a specified Clustering model to specified 95 data and returns results values DM_expClusModel Returns a character large object representation of 98 a value of type DM_ClusteringModel DM_getClusMdlName Obtains the name of a Clustering model 105 DM_getClusMdlSpec Returns a value of DM_LogicalDataSpec 106 containing the set of fields needed for an application DM_getClusterName Gets the name of the cluster at a specified 109 position DM_getNumClusters Returns the number of clusters contained in a 111 value of type DM_ClusteringModel DM_impClusFile Imports a Demographic or Neural Clustering 124 model into a DB2 database DM_impClusFileE Imports a Demographic or Neural Clustering 125 model into a DB2 database by specifying an encoding Chapter 7. Overview of IM Scoring database objects 79
  • 95. Table 20. Functions for working with data mining model type DM_ClusteringModel (continued) See Scoring function (UDF) Purpose page DM_impClusModel Converts a Clustering model from the CLOB 127 format Table 21. Functions for working with data mining model type DM_RegressionModel See Scoring function (UDF) Purpose page DM_applyRegModel Applies a specified Regression model to 96 specified data and returns results values DM_expRegModel Returns a character large object representation of 99 a value of type DM_RegressionModel DM_getRegMdlName Obtains the name of a Regression model 117 DM_getRegMdlSpec Returns a value of DM_LogicalDataSpec 118 containing the set of fields needed for an application DM_getRegTarget Returns the target field for Regression 119 DM_impRegFile Imports a Neural Prediction model, an RBF 128 Prediction model, or a Polynomial Regression model into a DB2 database DM_impRegFileE Imports a Neural Prediction model, an RBF 129 Prediction model, or a Polynomial Regression model into a DB2 database by specifying an encoding DM_impRegModel Converts a Regression model from the CLOB 130 format Table 22. Functions for working with scoring result type DM_RegResult See Scoring function (UDF) Purpose page DM_getPredValue Obtains the predicted value from a results value 113 returned by the function DM_applyRegModel DM_getRBFRegionID Returns the number of the region to which the 116 record was assigned 80 Administration and Programming for DB2
  • 96. Parameter sizes The sizes of the input and output parameters of the methods and functions are as follows: v Names and field names: VARCHAR(128) v Model names in the functions DM_getClasMdlName, DM_getRuleMdlName, and DM_getClusMdlName: VARCHAR(256) v CLOBs in the methods DM_expDataSpec and DM_impDataSpec: The default size is 200 KB. For instructions on how to specify a different size, see the information about the StructSize parameter in “The idmenabledb command” on page 133. v CLOBs in the function DM_expClusModel: This depends on the optional ClusModelSize parameter specified when idmenabledb was called. - Default fenced mode: 10 MB - Default unfenced mode: 50 MB v CLOBs in the function DM_expClasModel: This depends on the optional ClasModelSize parameter specified when idmenabledb was called. - Default fenced mode: 10 MB - Default unfenced mode: 50 MB v CLOBs in the function DM_expRuleModel: This depends on the optional RuleModelSize parameter specified when idmenabledb was called. - Default fenced mode: 10 MB - Default unfenced mode: 50 MB v Data record in DM_applyClasModel, DM_applyClusModel, and DM_applyRegModel: The default is 500 KB. For instructions on how to specify a different size, see the information about the ApplDataSize parameter in “The idmenabledb command” on page 133. v Result (DM_ClasResult, DM_ClusResult, DM_RegResult) in DM_applyClasModel, DM_applyClusModel, and DM_applyRegModel: VARCHAR(512) Chapter 7. Overview of IM Scoring database objects 81
  • 97. 82 Administration and Programming for DB2
  • 98. Chapter 8. IM Scoring methods reference This chapter contains full descriptions of all the IM Scoring methods. They are presented in alphabetical order of the method names. For brief overview descriptions of these methods, see “Methods provided by IM Scoring” on page 77. For instructions on how to use the database objects presented here, see Chapter 5, “Using IM Scoring” on page 41. For instructions on how to read the syntax diagrams, see “How to read the syntax diagrams” on page xiii. © Copyright IBM Corp. 2001, 2002 83
  • 99. DM_expDataSpec This method converts a value of type DM_LogicalDataSpec to a CLOB value, and returns it. Syntax Method syntax fields..DM_expDataSpec ( ) Function syntax DM_expDataSpec ( fields ) Parameters fields A value of type DM_LogicalDataSpec Return value The return value is a CLOB value converted from fields. 84 Administration and Programming for DB2
  • 100. DM_getFldName This method returns the name of a field at a specified position in a value of type DM_LogicalDataSpec. Syntax Method syntax fields..DM_getFldName ( position ) Function syntax DM_getFldName ( fields , position ) Parameters fields A value of type DM_LogicalDataSpec consisting of a set of fields position An INTEGER value, ranging from 1 to the number of fields, that specifies the position Return value v If position is NULL, the return value is NULL. v The return value is the name of the mining field that is identified by the number given in position in the following situation: The value of position must be greater than zero, and less than or equal to the result of a call of DM_getNumFields(). The return value is a VARCHAR value. v Any other value for position raises an exception. The exception states that the parameter is out of range. Chapter 8. IM Scoring methods reference 85
  • 101. DM_getFldType This method returns the mining field type of a specified field that is contained in a value of type DM_LogicalDataSpec. Syntax Method syntax fields..DM_getFldType ( fieldName ) Function syntax DM_getFldType ( fields , fieldName ) Parameters fields A value of type DM_LogicalDataSpec consisting of a set of fields fieldName A field name, of type VARCHAR, that is already contained in fields The DM_LogicalDataSpec type defines the list of fields needed for a model application run and their mining field types. The mining field type describes which field type the field has in the model. Possible types are DM_Categorical or DM_Numerical, and these types map by default to the DB2 source data types listed here. Table 23. Mining field types Mining Mining data field field type Source data type type name value DM_Categorical 0 CHAR, VARCHAR, LONG VARCHAR, TIME, DATE, TIMESTAMP DM_Numerical 1 SMALLINT, INTEGER, DOUBLE, FLOAT, DECIMAL, BIGINT, REAL Return value v If fieldName is NULL, the return value is NULL. v If fieldName is not the name of any field contained in fields, this raises an exception. The exception states that the field is not defined in the logical data specification. v Otherwise, the return value is the mining field type, as a SMALLINT value, of the field fieldName contained in the set of fields fields. 86 Administration and Programming for DB2
  • 102. DM_getNumFields This method returns the number of fields that are contained in a value of type DM_LogicalDataSpec. Syntax Method syntax fields..DM_getNumFields ( ) Function syntax DM_getNumFields ( fields ) Parameters fields A value of type DM_LogicalDataSpec consisting of a set of fields Return value The return value is the number of fields that are contained in fields. The return value is of type INTEGER. Chapter 8. IM Scoring methods reference 87
  • 103. DM_impDataSpec This method converts a CLOB value to a value of type DM_LogicalDataSpec. The CLOB value must be a value that was previously exported using the DM_expDataSpec method. Syntax Method syntax fields..DM_impDataSpec ( logDataSpec ) Function syntax DM_impDataSpec ( fields , logDataSpec ) Parameters fields A value of type DM_LogicalDataSpec logDataSpec A CLOB value containing logical data specifications Return value The return value is fields containing the content of logDataSpec. Any existing definitions in fields are overwritten by the content of logDataSpec. 88 Administration and Programming for DB2
  • 104. DM_isCompatible This method determines whether a DM_LogicalDataSpec value is compatible with another DM_LogicalDataSpec value. Syntax Method syntax existLogDataSpec..DM_isCompatible ( logDataSpec ) Function syntax DM_isCompatible ( existLogDataSpec , logDataSpec ) Parameters existLogDataSpec A value of type DM_LogicalDataSpec consisting of a set of fields logDataSpec A value of type DM_LogicalDataSpec to be checked for compatibility Return value Note that calls to DM_isCompatible are not symmetric. A call DM_isCompatible(A,B) can return a result different from that returned by DM_isCompatible(B,A). v The return value is 1 as an INTEGER value if logDataSpec is compatible to existLogDataSpec. For every field entry in existLogDataSpec, there must be a field in logDataSpec with an identical name. The type of a field in logDataSpec must be DM_Numerical whenever this is the type of the corresponding field in existLogDataSpec. v If logDataSpec or existLogDataSpec is NULL, the return value is NULL. v Otherwise, the return value is 0 as an INTEGER value. Chapter 8. IM Scoring methods reference 89
  • 105. 90 Administration and Programming for DB2
  • 106. Chapter 9. IM Scoring functions reference This chapter contains full descriptions of all the IM Scoring functions. They are presented in alphabetical order of the function names. For brief overview descriptions of these functions, see “Functions provided by IM Scoring” on page 77. For instructions on how to use the database objects presented here, see Chapter 5, “Using IM Scoring” on page 41. For instructions on how to read the syntax diagrams, see “How to read the syntax diagrams” on page xiii. © Copyright IBM Corp. 2001, 2002 91
  • 107. DM_applData This function obtains the data to which a model is applied by DM_applyClusModel, DM_applyClasModel, or DM_applyRegModel. It builds a value of type DM_ApplicationData, which includes the label and the value obtained from the column specified in the SQL statement. This function is available in two different notations. One notation has two parameters; the other has three parameters. Use the two-parameter notation for the inner call in nested calls, or if the application data consists only of one field-value pair. Syntax Two-parameter notation DM_applData ( label , character expression ) numeric expression Three-parameter notation DM_applData ( application data , label , character expression ) numeric expression Parameters application data Identifies application data of type DM_ApplicationData to which the label and value of the expression are appended before being returned. In nested calls, this parameter is the return value of a DM_applData call. label The label is a literal and must be defined within single quotation marks. The string enclosed within the quotation marks must be a precise match, including case, for the name of a field defined in the model you apply. Together with the character or numeric expression that follows, this forms the data to which a model is applied. It will be appended to the DM_ApplicationData value. character expression An expression that returns a value of type CHAR, VARCHAR, or LONG VARCHAR. Note: IM for Data handles date values and time values in character ISO format. If you want to specify a date value or a time value as input to DM_applData and you want to apply a model created by IM for Data, the date value and time value must be cast to 92 Administration and Programming for DB2
  • 108. character ISO format by using CHAR(<value>, ISO) for a date value and CHAR(<value>, JIS) for a time value. numeric expression An expression that returns a value that is a numeric data type, either INTEGER, DOUBLE, DECIMAL, FLOAT, BIGINT, or SMALLINT. For example, you can include an expression such as age + 10. If multiple data items are to be included, multiple nested calls to DM_applData must be made. Return value The return value is a value of data type DM_ApplicationData. Chapter 9. IM Scoring functions reference 93
  • 109. DM_applyClasModel This function applies a Classification model to selected data and produces results. Syntax DM_applyClasModel ( model , application data ) Parameters model The Classification model of type DM_ClasModel that is to be applied to data. application data Specifies the data of type DM_ApplicationData to which the model is applied. A DM_ApplicationData value is returned by DM_applData or DM_impApplData. See “DM_applData” on page 92 and “DM_impApplData” on page 120. Return value The results produced by applying a model using DM_applyClasModel are returned as data type DM_ClasResult. If model or application data is NULL, the return value is NULL. 94 Administration and Programming for DB2
  • 110. DM_applyClusModel This function applies a Clustering model to selected data and produces results. Syntax DM_applyClusModel ( model , application data ) Parameters model The Clustering model of type DM_ClusteringModel that is to be applied to data. application data Specifies the data of type DM_ApplicationData to which the model is applied. A DM_ApplicationData value is returned by DM_applData or DM_impApplData. See “DM_applData” on page 92 and “DM_impApplData” on page 120. Return value The results produced by applying a model using DM_applyClusModel are returned as data type DM_ClusResult. If the input model is a Demographic Clustering model that does not contain distance units, they are calculated by default for numeric fields. The default value is half of the square root of the variance of the field. If model or application data is NULL, the return value is NULL. Chapter 9. IM Scoring functions reference 95
  • 111. DM_applyRegModel This function applies a Regression model to selected data and produces results. Syntax DM_applyRegModel ( model , application data ) Parameters model The Regression model of type DM_RegressionModel that is to be applied to data. application data Specifies the data of type DM_ApplicationData to which the model is applied. A DM_ApplicationData value is returned by DM_applData or DM_impApplData. See “DM_applData” on page 92 and “DM_impApplData” on page 120. Return value The results produced by applying a model using DM_applyRegModel are returned as data type DM_RegResult. If model or application data is NULL, the return value is NULL. 96 Administration and Programming for DB2
  • 112. DM_expClasModel This function returns a character large object representing a PMML Classification model of type DM_ClasModel. Syntax DM_expClasModel ( classificationModel ) Parameters classificationModel A value of type DM_ClasModel Return value The return value is a character large object representing the PMML Classification model in question. The CLOB XML string is encoded in database encoding (the codeset specified when the database was created) regardless of the encoding specification found in the XML content. Chapter 9. IM Scoring functions reference 97
  • 113. DM_expClusModel This function returns a character large object representing the PMML Clustering model extracted from a value of type DM_ClusteringModel. Syntax DM_expClusModel ( clusteringModel ) Parameters clusteringModel A value of type DM_ClusteringModel Return value The return value is a character large object representing the Clustering model. The CLOB XML string is encoded in database encoding (the codeset specified when the database was created) regardless of the encoding specification found in the XML content. 98 Administration and Programming for DB2
  • 114. DM_expRegModel This function returns a character large object representing a PMML Regression model or a PMML or XML prediction model of type DM_RegressionModel. Syntax DM_expRegModel ( regressionModel ) Parameters regressionModel A value of type DM_RegressionModel Return value The return value is a character large object representing the Regression model. The CLOB XML string is encoded in database encoding (the codeset specified when the database was created) regardless of the encoding specification found in the XML content. Chapter 9. IM Scoring functions reference 99
  • 115. DM_getClasCostRate This function returns the classification cost rate of the Classification model computed in a validation phase during the training phase. If you need more information, see the description of DM_setClasCostRate in Intelligent Miner Modeling Administration and Programming. Syntax DM_getClasCostRate ( clasModel ) Parameters clasModel A value of type DM_ClasModel Return value v If the model was calculated without validation, the return value is NULL. v Otherwise, the return value is the cost rate of the Classification model contained in clasModel as a DOUBLE value, computed using the validation data. 100 Administration and Programming for DB2
  • 116. DM_getClasMdlName This function obtains the name of a Classification model of type DM_ClasModel. Syntax DM_getClasMdlName ( model ) Parameters model A Classification model of the type DM_ClasModel Return value The return value is the name of the Classification model as a VARCHAR value. Chapter 9. IM Scoring functions reference 101
  • 117. DM_getClasMdlSpec This function returns the DM_LogicalDataSpec value representing the set of fields needed for an application of the Classification model. Syntax DM_getClasMdlSpec ( classificationModel ) Parameters classificationModel A value of type DM_ClasModel Return value The return value is the DM_LogicalDataSpec value representing the set of fields needed for an application of the Classification model. 102 Administration and Programming for DB2
  • 118. DM_getClasTarget This function returns the name of the target field of a Classification model. If the model was produced through the use of IM Modeling, the target field name is the name that was set by means of the method DM_clasSetTarget of type DM_ClasSettings. Syntax DM_getClasTarget ( clasModel ) Parameters clasModel A value of type DM_ClasModel Return value The return value is the target field name as a VARCHAR value. Chapter 9. IM Scoring functions reference 103
  • 119. DM_getClusConf This function returns the confidence of attributing the record r to the best cluster in comparison with attributing it to another cluster of the applied model. The ID of the best cluster is returned by the function DM_getClusterID. Syntax DM_getClusConf ( clusResult ) Parameters clusResult A value of type DM_ClusResult Return value v If clusResult is NULL, the return value is NULL. v Otherwise, the return value is the confidence value as data type DOUBLE. The confidence is a value between 0 and 1 of type DOUBLE. A value near 0 should never appear. A value near 0.5 (50%) means that the record might fit equally well into another cluster of the model. A value near 1 means that it is quite certain that the record belongs to the best cluster. It also means that the record definitely does not suit any other clusters of the model. 104 Administration and Programming for DB2
  • 120. DM_getClusMdlName This function obtains the name of a Clustering model of type DM_ClusteringModel. Syntax DM_getClusMdlName ( model ) Parameters model A Clustering model of the type DM_ClusteringModel Return value The return value is the name of the Clustering model as a VARCHAR value. Chapter 9. IM Scoring functions reference 105
  • 121. DM_getClusMdlSpec This function returns the DM_LogicalDataSpec value representing the set of fields needed for an application of the Clustering model. Syntax DM_getClusMdlSpec ( clusteringModel ) Parameters clusteringModel A value of type DM_ClusteringModel Return value The return value is the DM_LogicalDataSpec value representing the set of fields needed for an application of the Clustering model. 106 Administration and Programming for DB2
  • 122. DM_getClusScore This function obtains the Clustering score from results data that is produced when you apply a Clustering model. The score is an expression of how closely the data matches the model cluster. For Demographic Clustering, a score value close to 1.0 indicates a good match. For Neural Clustering, a score value close to 0.0 indicates a good match. This function is related to the following: v DM_getClusConf, which is compatible with the IM for Data score value v DM_getQuality, which is a new function Syntax DM_getClusScore ( results value ) Parameters results value The result of applying a Clustering model, returned by function DM_applyClusModel as data type DM_ClusResult Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the Clustering score as data type DOUBLE. Chapter 9. IM Scoring functions reference 107
  • 123. DM_getClusterID This function obtains the cluster ID from results data that is produced when you apply a Clustering model. This identifies the position of the cluster in the Clustering model that is the best match for this data. For information about the difference between the cluster IDs that are returned by IM Scoring V7.1 and IM Scoring V8.1, see “Using the function DM_getClusterID” on page 170. To get the cluster name shown by the IM for Data V6 Clustering Visualizer, use DM_getClusterName(). Syntax DM_getClusterID ( results value ) Parameters results value The result of applying a Clustering model, returned by function DM_applyClusModel as data type DM_ClusResult. Usually, cluster IDs are between 1 and the number of clusters as returned by the function DM_getNumClusters. Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the cluster ID as data type INTEGER. 108 Administration and Programming for DB2
  • 124. DM_getClusterName This function returns the name of a cluster at a specified position in a value of type DM_ClusteringModel. Syntax DM_getClusterName ( clusteringModel , position ) Parameters clusteringModel A value of type DM_ClusteringModel. position A value, of type INTEGER, between 1 and the result of DM_getNumClusters. Here you can specify the value obtained by a call to DM_getClusterID(). Return value v If clusteringModel is NULL, the return value is NULL. v If position is less than 1 or greater than the result of a call to DM_getNumClusters, this raises an exception. The exception states that the parameter is out of range. v If position is NULL, the return value is NULL. v If the cluster at the specified position has no name, the return value is NULL. v Otherwise, the return value is the name of the specified cluster contained in clusteringModel, of type VARCHAR. Chapter 9. IM Scoring functions reference 109
  • 125. DM_getConfidence This function obtains the classification confidence value from results data that was produced when you applied a Classification model. This is a value between 0.0 and 1.0 that expresses the probability that the class is predicted correctly. Syntax DM_getConfidence ( results value ) Parameters results value The result of applying a Classification model, returned by the function DM_applyClasModel as data type DM_ClasResult Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the confidence value as data type DOUBLE. 110 Administration and Programming for DB2
  • 126. DM_getNumClusters This function returns the number of clusters contained in a value of type DM_ClusteringModel. Syntax DM_getNumClusters ( clusteringModel ) Parameters clusteringModel A value of type DM_ClusteringModel Return value v If clusteringModel is NULL, the return value is NULL. v Otherwise, the return value is the number of clusters contained in clusteringModel, as an INTEGER value. Chapter 9. IM Scoring functions reference 111
  • 127. DM_getPredClass This function obtains the predicted class from results data that is produced when you apply a Classification model. This identifies the class within the model to which the data matches. Syntax DM_getPredClass ( results value ) Parameters results value The result of applying a Classification model, returned by the function DM_applyClasModel as data type DM_ClasResult Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the predicted class as data type VARCHAR. 112 Administration and Programming for DB2
  • 128. DM_getPredValue This function obtains the predicted value from results data that is produced when you apply a Regression model. This value is calculated according to relations that are established by the model. Syntax DM_getPredValue ( results value ) Parameters results value The result of applying a Regression model, returned by the function DM_applyRegModel as data type DM_RegResult Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the predicted value as data type DOUBLE. Chapter 9. IM Scoring functions reference 113
  • 129. DM_getQuality This function returns the quality of the result for the cluster whose ID is given by the function DM_getClusterID. It measures how well the applied record fits into the specified cluster. The returned value is between 0.0 and 1.0. A value close to 0.0 means that the record does not fit at all in the cluster. A value close to 1.0 means that the record fits very well in the specified cluster. The quality measurement depends on the algorithm used to score the record, so that a direct comparison between the quality of algorithm results is not possible. However, both algorithms use a linear, possibly similar, quality measurement function. Syntax DM_getQuality ( results value ) Parameters results value The result of applying a Clustering model, returned by the function DM_applyClusModel as data type DM_ClusResult Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the quality of the result for the specified cluster, as a value of type DOUBLE between 0.0 and 1.0. 114 Administration and Programming for DB2
  • 130. DM_getQuality(clusterid) This function returns the quality of the result for a specified cluster. It measures how well the applied record fits into the specified cluster. The result may not contain a quality measure for all the available clusters. IM Scoring V8 usually computes the quality measure for only the two best matching clusters; it does this in order to optimize the memory needed to store the clustering result. The returned value is between 0.0 and 1.0. A value close to 0.0 means that the record does not fit at all in the cluster. A value close to 1.0 means that the record fits very well in the specified cluster. The quality measurement depends on the algorithm used to score the record, so that a direct comparison between the quality of algorithm results is not possible. However, both algorithms use a linear, possibly similar, quality measurement function. Syntax DM_getQuality ( results value , clusterid ) Parameters results value The result of applying a Clustering model, returned by the function DM_applyClusModel as data type DM_ClusResult clusterid The ID of the cluster for which the quality should be returned Return value v If results value is NULL, the return value is NULL. v If the result does not contain a quality measure for the requested cluster, the return value is NULL. v Otherwise, the return value is the quality of the result for the specified cluster, as a value of type DOUBLE between 0.0 and 1.0. Chapter 9. IM Scoring functions reference 115
  • 131. DM_getRBFRegionID This function returns the number of the region to which the record was assigned. The value is returned from results data that is produced when you apply an RBF Regression model. Syntax DM_getRBFRegionID ( results value ) Parameters results value The results of applying an RBF Regression model returned by the function DM_applyRegModel as data type DM_RegResult Return value v If results value is NULL, the return value is NULL. v Otherwise, the return value is the region ID as data type INTEGER. If NULL is returned, one of the following error events might have occurred: – The results value is computed on a Regression model that is not an RBF Regression model. – The results value is computed based on an RBF Regression model, however, the record contains too many missing values to compute a result value. This applies if the same results value is used as input for the function DM_getPredValue, and the function returns NULL. 116 Administration and Programming for DB2
  • 132. DM_getRegMdlName This function obtains the name of a Regression model of type DM_RegressionModel. Syntax DM_getRegMdlName ( model ) Parameters model A Regression model of the type DM_RegressionModel Return value The return value is the name of the Regression model as a VARCHAR value. Chapter 9. IM Scoring functions reference 117
  • 133. DM_getRegMdlSpec This function returns the DM_LogicalDataSpec value representing the set of fields needed for an application of the Regression model. Syntax DM_getRegMdlSpec ( regressionModel ) Parameters regressionModel A value of type DM_RegressionModel Return value The return value is the DM_LogicalDataSpec value representing the set of fields needed for an application of the Regression model. 118 Administration and Programming for DB2
  • 134. DM_getRegTarget This function returns the target field (that is, the field to be predicted) for the Regression function. Syntax DM_getRegTarget ( regressionModel ) Parameters regressionModel A value of type DM_RegressionModel Return value v If regressionModel is NULL, the return value is NULL. v Otherwise, the return value is the name of the target field in the Regression model, as a VARCHAR value. Chapter 9. IM Scoring functions reference 119
  • 135. DM_impApplData This function converts application input data of type CLOB, CHAR, or VARCHAR into type DM_ApplicationData. Syntax DM_impApplData ( applData_as_string ) Parameters applData_as_string Application input data as a CLOB, CHAR, or VARCHAR value. The application input data must be a well-formed XML element. The element must match the DTD declaration of type row as shown in the following example. <!ELEMENT row (column*) > <!ELEMENT column (PCDATA) > <!ATTLIST column name CDATA #REQUIRED null (true | false ) "false" > A value of the attribute name is interpreted as the name of the scoring field. The content of an element is interpreted as the value of the named scoring field. The order of the column elements is not relevant. The data for some fields might be NULL. In this case, it is represented by an element where the attribute null has the value true and the content is empty. DM_impApplData also accepts the output of the DB2 function REC2XML. Return value v If applData_as_string is NULL, the return value is NULL. v Otherwise, the return value is application input data, converted into type DM_ApplicationData. 120 Administration and Programming for DB2
  • 136. DM_impClasFile This function reads a Tree or Neural Classification model from a file, and returns it as data of type DM_ClasModel. This data can then be inserted into a table where one of the columns is designated for this data type. The encoding of characters in the imported file is determined by default. For information about the default encoding rules, see “DM_impClasFileE” on page 122. The call DM_impClasFile(<file name>) corresponds to DM_impClasFileE(<file name>,cast (NULL as CHAR)) When the content of the file is read and imported into DB2, the file is no longer needed. However, you might want to keep the file in case you need to import the model again. Syntax DM_impClasFile ( input file name ) Parameters input file name The name and full path of the file to be imported. The specified file must exist on the database server and the DB2 instance owner (unfenced) or DB2 fenced user (fenced) must have read access to the file. Return value The return value is the imported model as data type DM_ClasModel. Chapter 9. IM Scoring functions reference 121
  • 137. DM_impClasFileE This function reads a Tree or Neural Classification model from a file, and returns it as data of type DM_ClasModel. This data can then be inserted into a table where one of the columns is designated for this data type. The encoding of characters in the imported file is specified by a parameter. When the content of the file is read and imported into DB2, the file is no longer needed. However, you might want to keep the file in case you need to import the model again. Syntax DM_impClasFileE ( input file name , encoding name ) Parameters input file name The name and full path of the file to be imported. The specified file must exist on the database server. encoding name IM Scoring supports different encoding names. The MIME encoding strings are recommended, for example, iso-8859-1. v If the encoding name is a non-empty string, the value is interpreted as an XML encoding. The content of the imported file is parsed according to this encoding. The imported file must be a PMML 1.1 or PMML 2.0 document. v If the encoding name is the string ’SYSTEM’, the locale settings of the operating system are used to determine the encoding of the file. v If the encoding name is NULL, the encoding is determined by the imported file. – If the file is a PMML 1.1 or PMML 2.0 document, the standard rules of the XML specification apply. That is, the encoding is either given explicitly or assumed to be Unicode by default. – If the imported file is written in Intelligent Miner format, the locale settings of the operating system are used. Return value The return value is the imported model as data type DM_ClasModel. 122 Administration and Programming for DB2
  • 138. DM_impClasModel This function converts the Classification model given as CLOB into a Classification model of type DM_ClasModel. Syntax DM_impClasModel ( model_as_CLOB ) Parameters model_as_CLOB A Classification model in PMML 1.1 or PMML 2.0 format as a CLOB value. The model is assumed to be provided in database encoding (the codeset specified when the database was created), regardless of the XML encoding specification contained in the model. Return value The return value is a Classification model of type DM_ClasModel. Chapter 9. IM Scoring functions reference 123
  • 139. DM_impClusFile This function reads a Demographic or Neural Clustering model from a file, and returns it as data of type DM_ClusteringModel. This data can then be inserted into a table where one of the columns is designated for this data type. The encoding of characters in the imported file is determined by default. For information about the default encoding rules, see “DM_impClusFileE” on page 125. The call DM_impClusFile(<file name>) corresponds to DM_impClusFileE(<file name>,cast (NULL as CHAR)) When the content of the file is read and imported into DB2, the file is no longer needed. However, you might want to keep the file in case you need to import the model again. Syntax DM_impClusFile ( input file name ) Parameters input file name The name and full path of the file to be imported. The specified file must exist on the database server and the DB2 instance owner (unfenced) or DB2 fenced user (fenced) must have read access to the file. Return value The return value is the imported model as data type DM_ClusteringModel. If the input model is a Demographic Clustering Model that does not contain distance units, they are calculated by default for numeric fields. The default value is half the square root of the variance of the field. 124 Administration and Programming for DB2
  • 140. DM_impClusFileE This function reads a Demographic or Neural Clustering model from a file, and returns it as data of type DM_ClusteringModel. This data can then be inserted into a table where one of the columns is designated for this data type. The encoding of characters in the imported file is specified by a parameter. When the content of the file is read and imported into DB2, the file is no longer needed. However, you might want to keep the file in case you need to import the model again. Syntax DM_impClusFileE ( input file name , encoding name ) Parameters input file name The name and full path of the file to be imported. The specified file must exist on the database server. encoding name IM Scoring supports different encoding names. The MIME encoding strings are recommended, for example, iso-8859-1. v If the encoding name is a non-empty string, the value is interpreted as an XML encoding. The content of the imported file is parsed according to this encoding. The imported file must be a PMML 1.1 or PMML 2.0 document. v If the encoding name is the string ’SYSTEM’, the locale settings of the operating system are used to determine the encoding of the file. v If the encoding name is NULL, the encoding is determined by the imported file. – If the file is a PMML 1.1 or PMML 2.0 document, the standard rules of the XML specification apply. That is, the encoding is either specified explicitly or assumed to be Unicode by default. – If the imported file is written in Intelligent Miner format, the locale settings of the operating system are used. Return value The return value is the imported model as data type DM_ClusteringModel. If the input model is a Demographic Clustering model that does not contain distance units, they are calculated by default for numeric fields. The default Chapter 9. IM Scoring functions reference 125
  • 141. value is half the square root of the variance of the field. 126 Administration and Programming for DB2
  • 142. DM_impClusModel This function converts a Clustering model given as a CLOB value into a Clustering model of type DM_ClusteringModel. Syntax DM_impClusModel ( model_as_CLOB ) Parameters model_as_CLOB A Clustering model in PMML 1.1 or PMML 2.0 format as a CLOB value. The model is assumed to be provided in database encoding (the codeset specified when the database was created), regardless of the XML encoding specification contained in the model. Return value The return value is a Clustering model of type DM_ClusteringModel. Chapter 9. IM Scoring functions reference 127
  • 143. DM_impRegFile This function reads an RBF or Neural Prediction model or a Polynomial Regression model from a file, and returns it as data of type DM_RegressionModel. This data can then be inserted into a table where one of the columns is designated for this data type. The encoding of characters in the imported file is determined by default. For information about the default encoding rules, see “DM_impRegFileE” on page 129. The call DM_impRegFile(<file name>) corresponds to DM_impRegFileE(<file name>,cast (NULL as CHAR)) When the content of the file is read and imported into DB2, the file is no longer needed. However, you might want to keep the file in case you need to import the model again. Syntax DM_impRegFile ( input file name ) Parameters input file name The name and full path of the file to be imported. The specified file must exist on the database server and the DB2 instance owner (unfenced) or DB2 fenced user (fenced) must have read access to the file. Return value The return value is the imported model as data type DM_RegressionModel. 128 Administration and Programming for DB2
  • 144. DM_impRegFileE This function reads an RBF or Neural Prediction model or a Polynomial Regression model from a file, and returns it as data of type DM_RegressionModel. This data can then be inserted into a table where one of the columns is designated for this data type. The encoding of characters in the imported file is specified by a parameter. When the content of the file is read and imported into DB2, the file is no longer needed. However, you might want to keep the file in case you need to import the model again. Syntax DM_impRegFileE ( input file name , encoding name ) Parameters input file name The name and full path of the file to be imported. The specified file must exist on the database server. encoding name IM Scoring supports different encoding names. The MIME encoding strings are recommended, for example, iso-8859-1. v If the encoding name is a non-empty string, the value is interpreted as an XML encoding. The content of the imported file is parsed according to this encoding. The imported file must be a PMML 1.1 or PMML 2.0 document. v If the encoding name is the string ’SYSTEM’, the locale settings of the operating system are used to determine the encoding of the file. v If the encoding name is NULL, the encoding is determined by the imported file. – If the file is a PMML 1.1 or PMML 2.0 document, the standard rules of the XML specification apply. That is, the encoding is either specified explicitly or assumed to be Unicode by default. – If the imported file is written in Intelligent Miner format, the locale settings of the operating system are used. Return value The return value is the imported model as data type DM_RegressionModel. Chapter 9. IM Scoring functions reference 129
  • 145. DM_impRegModel This function converts a Regression model given as CLOB into a Regression model of type DM_RegressionModel. Syntax DM_impRegModel ( model_as_CLOB ) Parameters model_as_CLOB A Regression model in PMML 1.1 or PMML 2.0 format as a CLOB value. The model is assumed to be provided in database encoding (the codeset specified when the database was created), regardless of the XML encoding specification contained in the model. Return value The return value is a Regression model of type DM_RegressionModel. 130 Administration and Programming for DB2
  • 146. Chapter 10. IM Scoring command reference This chapter contains full descriptions of the executables provided with IM Scoring. These executables make it possible to: v Enable a database. See “The idmenabledb command” on page 133. v Check if a database has been enabled. See “The idmcheckdb command” on page 132. v Disable a database. See “The idmdisabledb command” on page 132. v Enable the DB2 instance on UNIX platforms. See “The idminstfunc command” on page 135. v Disable the DB2 instance on UNIX platforms. See “The idmuninstfunc command” on page 138. v Convert models from IM for Data format to PMML 2.0 format. See “The idmxmod command” on page 139. v Generate from a PMML model an SQL script for performing a scoring run. See “The idmmkSQL command” on page 136. v Check your license status. See “The idmlicm command” on page 135. v Obtain information about your product version and other installed software. See “The idmlevel command” on page 135. You can find these commands in the appropriate directory for your platform, as follows: AIX: /usr/lpp/IMinerX/bin Linux and Sun Solaris: /opt/IMinerX/bin Windows: <install path>/IMinerX/bin © Copyright IBM Corp. 2001, 2002 131
  • 147. The idmcheckdb command Use the idmcheckdb utility to check whether a specified database is enabled for IM Scoring, IM Modeling, or both. The utility also checks whether the database is enabled in fenced or unfenced mode. Command syntax idmcheckdb dbname The idmcheckdb utility returns the following messages: Table 24. The idmcheckdb messages Database successfully enabled for IM Modeling and IM Scoring in fenced mode Database successfully enabled for IM Scoring in fenced mode Database successfully enabled for IM Modeling in fenced mode Database is not enabled for either IM Scoring or IM Modeling The idmdisabledb command To disable a database, use the executable idmdisabledb. Command syntax idmdisabledb dbname tables Parameters dbname The name of the database that you want to disable. tables If the optional parameter tables is specified, the sample tables containing models are dropped. Warning: Use the tables parameter with caution. Before you drop the sample tables, consider that you will also delete the models contained in these tables. 132 Administration and Programming for DB2
  • 148. The idmenabledb command To enable a database, use the executable idmenabledb. The executable is shared between IM Modeling and IM Scoring. The executable has logic to detect the following: v Which products are installed v Which types and functions already exist in the database v Which types and functions need to be created Command syntax idmenabledb <dbname> fenced tables ClasModelSize <n> unfenced RuleModelSize <n> ClusModelSize <n> RegModelSize <n> StructSize <n> ApplDataSize <n> Example: idmenabledb testdb fenced ClasModelSize 5 RuleModelSize 7 RegModelSize 9 where: v 5 MB is the CLOB size used for the function DM_impClasModel and the BLOB size of type DM_ClasModel v 7 MB is the CLOB size used for the function DM_impRuleModel and the BLOB size of DM_RuleModel v 9 MB is the CLOB size used for the function DM_impRegModel and the BLOB size of DM_RegressionModel Note: DM_RuleModel is a type introduced by IM Modeling. DM_RuleModel is not used by IM Scoring, but is mentioned here because idmenabledb is a shared command between IM Scoring and IM Modeling. Use StructSize to specify the size of all IM Scoring structured types in megabytes. The default size of the DM_LogicalDataSpec data type is 200 KB, which is suitable for most of the models. Use ApplDataSize to specify the CLOB size of the distinct type DM_ApplicationData in megabytes. The default is 500 KB. Chapter 10. IM Scoring command reference 133
  • 149. Use ClasModelSize, RuleModelSize, ClusModelSize, and RegModelSize to specify the BLOB size of the distinct types DM_ClasModel, DM_RuleModel, DM_ClusteringModel, and DM_RegressionModel, and also the CLOB parameter size of the associated import and export model functions. Note: When you have created database objects (for example, tables) using these types, you can no longer change the size of the types without dropping these database objects. IBM recommends that you work with default sizes in a test environment and that you carefully consider the parameter sizes before moving to a production environment. If you want to change your parameter sizes after you have created models in your database, follow the instructions in “Exporting and importing models by means of DB2 Utilities” on page 168. If an existing model size is different from the new one that is specified, the idmenabledb executable: 1. Uses the existing definitions 2. Enables the database 3. Issues a warning message (NLS) that says: ″Database successfully enabled. However, some types already exist with a different size than requested. First disable the database and run the command again.″ If a model size is not specified for a specific model type, the default is 50 MB for unfenced and 10 MB for fenced. Note: In IM Scoring 7.1, the default for the BLOB model was 100 MB; the default for the CLOB model to be imported was 50 MB. The BLOB model in IM Scoring 8.1 is compressed. For this reason, the default for the CLOB and BLOB models is the same (50 MB for unfenced and 10 MB for fenced). If the optional parameter tables is specified, idmenabledb creates sample tables that are suitable for storing imported and generated models. The following tables are created if you specify the tables option: CREATE TABLE IDMMX.ClassifModels ( MODELNAME VARCHAR(240) NOT NULL PRIMARY KEY, MODEL IDMMX.DM_ClasModel ); CREATE TABLE IDMMX.ClusterModels ( MODELNAME VARCHAR(240) NOT NULL PRIMARY KEY, MODEL IDMMX.DM_ClusteringModel ); CREATE TABLE IDMMX.RegressionModels ( MODELNAME VARCHAR(240) NOT NULL PRIMARY KEY, MODEL IDMMX.DM_RegressionModel ); 134 Administration and Programming for DB2
  • 150. The idminstfunc command The idminstfunc command enables the DB2 instance for the use of IM Modeling and IM Scoring. Enabling the DB2 instance means that shared libraries containing the implementation of the UDFs, UDMs, and stored procedures are linked into the sqllib/function directory of the DB2 instance. This command must be called only on UNIX platforms, and can be called only by a user with SYSADM authority. Syntax: idmintfunc The idminstfunc command is a shared command between IM Modeling and IM Scoring, and it need be called only once if both products are installed. The idmlevel command Use the command idmlevel to collect information about the software environment (operating system, DB2 version) and IM product version you are using. Using ’idmlevel’ on Windows operating systems 1. Open a command window. 2. Invoke idmlevel <logfile> Example: idmlevel c:imscoring.log Using ’idmlevel’ on UNIX operating systems 1. In a command shell invoke: AIX: /usr/lpp/IMinerX/bin/idmlevel /tmp/imscoring.log Other UNIX platforms: /opt/IMinerX/bin/idmlevel /tmp/imscoring.log The idmlicm command Use the idmlicm command to check your license status. IM Scoring and IM Modeling 8.1 use nodelocked license keys installed in a license file. The full product contains production license keys, which are installed during installation. The ’Try and Buy’ version does not include a production license key. You can use the idmlicm command to generate a ’Try and Buy’ license key. Chapter 10. IM Scoring command reference 135
  • 151. The idmlicm command is located in the bin directory. It has no parameters. When invoked, idmlicm checks the license status. Sample output with production license The license for the product "DB2 Intelligent Miner Scoring" was found in the file "nodelock_sc". The license for the product "DB2 Intelligent Miner Modeling" was found in the file "nodelock_mo". If no production license is installed, idmlicm can generate a temporary license key the first time it is invoked. Sample output with temporary license W 2771: You are now using a temporary license. You need to enroll "DB2 Intelligent Miner Scoring" within "89" days in the license file "d:IMinerXbinnodelock_sc". If you invoke idmlicm when a temporary license is already installed, you get the following sample output: Sample output with temporary license already installed W 2772: The number of days left is "89". You are still using a temporary license. You need to enroll "DB2 Intelligent Miner Scoring" in the license file "d:IMinerXbinnodelock_sc". The idmmkSQL command The command idmmkSQL analyzes a PMML model, and generates from it an SQL script that contains the SQL statements necessary to perform a scoring run using the model. The script is basically a template. It contains placeholders that you replace with the names of concrete database objects in order to finally get the executable SQL script. Command syntax idmmkSQL inputfile options outputfile Parameters The inputfile parameter is the file containing the PMML model. The SQL template is written to outputfile. If no outputfile is given, the template is written to standard output. The following options exist. Note that, on Windows, options start with a slash (/), not with a hyphen (-). -M Identifies the method used to build input records. The option must be followed by one of the following values (not case-sensitive): v REC2XML. This is the default. 136 Administration and Programming for DB2
  • 152. v DM_applData. v DM_ApplColumn. This option applies only to Oracle SQL (IM Scoring for Oracle). v CONCAT. For an introduction to these options, see: “Specifying data by means of DM_applData” on page 52 “Specifying data by means of REC2XML” on page 51 “Specifying data by means of CONCAT” on page 53 -D Determines whether DB2 SQL or Oracle SQL is written. The option must be followed by one of the following values, which are not case-sensitive: v DB2 (the default value) v ORACLE -E Determines the encoding of the PMML file. The option must be followed by a valid encoding string. If this option is not given, the encoding string contained in the PMML file is used. If the file does not contain encoding information, utf-8 is the default encoding. -H Displays usage information. Example: idmmkSQL -D DB2 -M Concat -E iso-8859-15 thePMMLFile.pmml theScriptFile.sql Placeholders The placeholders are marked with leading and trailing ### signs, for example, ###TABLENAME###. A template can contain the following placeholders: ###IDMMX.CLUSTERMODELS### ###IDMMX.CLASSIFMODELS### ###IDMMX.REGRESSIONMODELS### This denotes the name of the table where the mining model is to be stored. If you simply remove the # signs, you get the name of the corresponding sample table. However, you can replace the placeholders by the name of any other table that you use to store your models. Note: To generate the SQL script by using idmmkSQL, you need the model as a PMML file. When executing the script, it imports the model into this database table. Chapter 10. IM Scoring command reference 137
  • 153. ###ABSOLUTE_PATH### The absolute path to the PMML file. The function that imports the PMML model needs the absolute pathname of the file. ###RECORDID### The table containing the input data records for the scoring run is expected to have a column of data type INTEGER. This column contains an identifier for different records. Replace ###RECORDID### with the name of the column. ###MODEL### The name of the column in the model table containing the model itself. In the sample tables, this name is MODEL; if you use the sample tables to store your models, you can simply remove the # signs. ###MODELNAME### The name of the column in the model table containing the model name. In the sample tables, this name is MODELNAME; if you use the sample tables to store your models, you can simply remove the # signs. ###TABLENAME### The name of the table containing your input data records. The idmuninstfunc command The idmuninstfunc command disables the DB2 instance for the use of IM Modeling and IM Scoring. Disabling the DB2 instance means that links to the shared libraries containing the implementation of the UDFs, UDMs, and stored procedures are removed from the sqllib/function directory of the DB2 instance. This command must be called only on UNIX platforms, and it can be called only by a user with SYSADM authority. Syntax: idmuninstfunc The idminstfunc command is a shared command between IM Modeling and IM Scoring, and needs be called only once if both products are installed. It is not possible to disable the instance for the use of only one product if both products are installed. After the idmuninstfunc command is called, IM Modeling and IM Scoring can no longer be used for that DB2 instance. To enable the DB2 instance again, call the idminstfunc command. 138 Administration and Programming for DB2
  • 154. The idmxmod command You can convert a model outside IM for Data from Intelligent Miner format to PMML 2.0 format by using the idmxmod command. Command syntax idmxmod in file name out file name Parameters in file name The name of a file containing a model in Intelligent Miner format out file name The name of the file containing the converted PMML 2.0 model Chapter 10. IM Scoring command reference 139
  • 155. 140 Administration and Programming for DB2
  • 156. Chapter 11. IM Scoring Java Beans reference The Java API is documented in online documentation (Javadoc) in the following directory: docScoringBeanindex.html © Copyright IBM Corp. 2001, 2002 141
  • 157. 142 Administration and Programming for DB2
  • 158. Part 3. Appendixes © Copyright IBM Corp. 2001, 2002 143
  • 159. 144 Administration and Programming for DB2
  • 160. Appendix A. Installing IM Scoring This chapter guides you through the tasks involved in installing and uninstalling IM Scoring on the available platforms: v AIX See “Installing IM Scoring on AIX systems” v Linux See “Installing IM Scoring on Linux systems” on page 149 v Sun Solaris See “Installing IM Scoring on Sun Solaris systems” on page 150 v Windows NT, Windows 2000, and Windows XP See “Installing IM Scoring on Windows systems” on page 153 This chapter also describes steps that you need to complete before and after the installation, as follows: v “Configuring the database management system on UNIX systems” on page 157 v “Configuring the database management system on Windows systems” on page 158 v “Enabling IM for Data to export PMML or XML models” on page 158 Installing IM Scoring on AIX systems Before you install IM Scoring on an AIX system, ensure that your system meets the prerequisites. Prerequisites for AIX systems The prerequisites that your AIX system must meet for installing IM Scoring are as follows: v 60 MB of additional disk space in the /usr file system On SP2® with DB2 EEE installation, this disk space is required on each node. v At least 256 MB RAM v DB2 UDB Version 7.2 Fixpack 7, or DB2 UDB Version 8 To download DB2 fixpacks, go to: http://www.ibm.com/software/data/db2/udb/support.html v AIX 4.3.3 or higher © Copyright IBM Corp. 2001, 2002 145
  • 161. If IM Scoring V7.1 is installed on your system, you are recommended to uninstall it before you install IM Scoring V8.1. Even though it is feasible to have those versions of the applications installed on the same system, there is a high risk of user error. In any case, you are recommended to use different DB2 instances if you intend to use both versions. To check whether IM Scoring V7.1 is already installed on your system, type the following on the AIX command line: lslpp -l "IMinerSc.services.*" The file sets that make up IM Scoring V7.1 should not be present. They are as follows: v IMinerSc.services.base v IMinerSc.services.cnv v IMinerSc.services.symblnk v IMinerSc.services.cstr For instructions on uninstalling IM Scoring V7.1 file sets, see IBM Intelligent Miner Scoring Administration and Programming for DB2, Version 7.1. You might also want to uninstall documentation related to IM Scoring V7.1. To do so, uninstall the file sets IMinerSc.services.doc.<language> Installing IM Scoring Use smit or smitty to install IM Scoring on AIX systems. The AIX system must have a DB2 server installed and configured. To install IM Scoring on an AIX system: 1. Log on as user root. 2. Insert the IM Scoring CD-ROM in the CD-ROM drive. 3. Type smit on the AIX command line. 4. Select the following options. These steps might differ slightly depending on the AIX version installed on your system. a. Software Installation and Maintenance b. Install and Update Software c. Install and Update from LATEST Available Software d. Enter /dev/cd0 as INPUT device e. Display the list of licensed software available to install. The relevant items in the list are: IMinerX.scoring.db2 (Intelligent Miner Scoring - DB2) Contains the scoring functionality of IM Scoring for DB2. 146 Administration and Programming for DB2
  • 162. IMinerX.scoring.db2.doc.en Contains the documentation for IM Scoring for DB2 in English in PDF and HTML format IMinerX.scoring.db2.doc.es Contains the documentation for IM Scoring for DB2 in Spanish in PDF and HTML format IMinerX.scoring.db2.doc.ja Contains the documentation for IM Scoring for DB2 in Japanese in PDF and HTML format IMinerX.scoring.db2.doc.ko Contains the documentation for IM Scoring for DB2 in Korean in PDF and HTML format IMinerX.scoring.db2.doc.cn Contains the documentation for IM Scoring for DB2 in Chinese (China) in PDF and HTML format IMinerX.scoring.db2.doc.tw Contains the documentation for IM Scoring for DB2 in Chinese (Taiwan) in PDF and HTML format IMinerX.conversion Contains the model conversion facility to convert models exported from IM for Data to PMML. This is an optional feature in IM Scoring for DB2. IMinerX.symblnk Contains symbolic links from the IM for Data /usr/lpp/IMiner/bin directory to the executables of the model conversion facility. This is an optional feature in IM Scoring for DB2, which requires IMiner.server.serial. This enables you to use the model conversion facility from IM for Data. To install IM Scoring, select the file set IMinerX.scoring.db2 To install the documentation, select IMinerX.scoring.db2.doc.<language> To install the optional feature ″Intelligent Miner — Symbolic Links″, select IMinerX.symblnk To install ″Intelligent Miner — Conversion″, select IMinerX.conversion On the smit installation menu, for AUTOMATICALLY install requisite software, type ’yes’. Appendix A. Installing IM Scoring 147
  • 163. After you have successfully installed IMinerX.scoring.db2, you must repeat the installation procedure in order to install the license that enables you to use IM Scoring. There are two kinds of IM Scoring license – the ’Try and Buy’ license and the regular full license. To install the IM Scoring ’Try and Buy’ license: In step 4e, select the file set IMinerX.scoring.tab.license (instead of the file set IMinerX.scoring.db2). The ’Try and Buy’ license expires after 59 days. To install the regular IM Scoring license: In step 4e, select the file set IMinerX.scoring.license (instead of the file set IMinerX.scoring.db2). Note: If you want to use IM Scoring on an SP2 in a DB2 EEE environment, you must install IM Scoring on each node where DB2 is installed. Before you can use IM Scoring, you must enable the DB2 instance and databases, and create sample tables. For information on how to do this, see: “Enabling the DB2 instance on UNIX systems” on page 157 “Configuring the database environment” on page 21 “Enabling databases” on page 41 “The idmenabledb command” on page 133 Before you can use the conversion utility, IM for Data must be enabled to export mining models properly. See “Enabling IM for Data to export PMML or XML models” on page 158. Uninstalling IM Scoring Before you uninstall IM Scoring on AIX systems, you must disable the databases and the DB2 instance. For information on how to do this, see: “Disabling databases” on page 42 “The idmdisabledb command” on page 132 “Disabling the DB2 instance on UNIX systems” on page 157 To uninstall IM Scoring on an AIX system: 1. Log on as user root. 2. Run the standard software and maintenance procedure from smit or smitty and select the following options: a. Software Installation and Maintenance b. Software Maintenance and Utilities c. Remove Installed Software 3. List the installed software. 4. Follow the instructions on the smit menus to uninstall IM Scoring. 148 Administration and Programming for DB2
  • 164. On SP2, you must uninstall IM Scoring on each individual node where IM Scoring is installed. Installing IM Scoring on Linux systems Before you install IM Scoring on a Linux system, ensure that your system meets the prerequisites. Prerequisites for Linux systems The prerequisites that your Linux system must meet for installing IM Scoring are as follows: v 60 MB of additional disk space v At least 256 MB RAM v Linux with kernel 2.2.18 or higher, and glibc Version 2.1.1 or higher v DB2 UDB Version 7.2 Fixpack 7, or DB2 UDB Version 8 To download DB2 fixpacks, go to: http://www.ibm.com/software/data/db2/udb/support.html v To install IM Scoring, you need RPM Installing IM Scoring To install IM Scoring on a Linux system: 1. Insert the Linux server CD-ROM in the CD-ROM drive. 2. Mount the CD-ROM by using the following command: mount /cdrom If the directory /cdrom is not listed in the file /etc/fstab, use the following command: mount -tauto9660 /<dev/hdx> /cdrom where /dev/hdx is your CD-ROM drive. 3. Depending on your system, go to the appropriate directory by using one of the following commands: v cd /cdrom/LINUX/i386 v cd LINUX/s390 4. Depending on your system, use the appropriate commands to complete the installation. On Linux/i386: ./linuxInstallSc On Linux/s390: ./linux390InstallSc Appendix A. Installing IM Scoring 149
  • 165. Before you can use IM Scoring, you must enable the DB2 instance and databases, and create sample tables. For information on how to do this, see: “Enabling the DB2 instance on UNIX systems” on page 157 “Configuring the database environment” on page 21 “Enabling databases” on page 41 “The idmenabledb command” on page 133 Uninstalling IM Scoring Before you uninstall IM Scoring on Linux systems, you must disable the databases and the DB2 instance. For instructions on how to do this, see: “Disabling databases” on page 42 “The idmdisabledb command” on page 132 “Disabling the DB2 instance on UNIX systems” on page 157 To uninstall IM Scoring on a Linux system: Use the appropriate command: On Linux/i386: ./linuxUninstallSc On Linux/s390: ./linux390UninstallSc Installing IM Scoring on Sun Solaris systems Before you install IM Scoring on a Sun Solaris system, ensure that your system meets the prerequisites. Prerequisites for Sun Solaris systems The prerequisites that your Sun Solaris system must meet for installing IM Scoring are as follows: v 60 MB of additional disk space in the /opt file system v At least 256 MB RAM v DB2 UDB Version 7.2 Fixpack 7, or DB2 UDB Version 8 To download DB2 fixpacks, go to: http://www.ibm.com/software/data/db2/udb/support.htm v Sun Solaris Version 2.6 or higher Make sure that the following patches are installed: 109147-09 108434-02 108435-02 To download patches, go to http://sunsolve.sun.com. 150 Administration and Programming for DB2
  • 166. Installing IM Scoring To install IM Scoring on a Sun Solaris system: 1. Log on as user root. 2. Mount the product CD to a directory of your choice, for example, /cdrom. 3. Go to the directory where you mounted the CD. 4. Go to the Sun Solaris directory by using the following command: cd SUN 5. Install the components you want by executing the appropriate script, as follows: IM Scoring ./sunInstallScD2 IMiner Conversion Utilities ./sunInstallCn Documentation By default, the US English documentation is installed together with IM Scoring. To install additional documentation packages, execute the following command: pkgadd -a ./admin -d ./ IMXSDD2XX.pkg where XX is one of the following: v EE for Spanish v JJ for Japanese v KK for Korean v ZC for Chinese (China) v ZT for Chinese (Taiwan) If IM for Data and IMiner Conversion Utilities are installed on the same system, symbolic links to the model conversion facility executables are created in the directory /opt/IMiner/bin. This enables you to use the model conversion facility from IM for Data. If other components are already installed on the system, these will not be installed again, though the installation script will try to do so. The messages that appear during installation will reflect this. You might have IM for Data and IMiner Conversion Utilities installed on different systems and you might want to use the model conversion facility from IM for Data. In this case, use the following command on the system where IM for Data is installed to install the model conversion facility: ./sunInstallCn Appendix A. Installing IM Scoring 151
  • 167. You can add further components on the system later by following the steps outlined above. Before you can use IM Scoring, you must enable the DB2 instance and databases, and create sample tables. For information on how to do this, see: “Enabling the DB2 instance on UNIX systems” on page 157 “Configuring the database environment” on page 21 “Enabling databases” on page 41 “The idmenabledb command” on page 133 Before you can use the conversion utility, IM for Data must be enabled to export mining models properly. See “Enabling IM for Data to export PMML or XML models” on page 158. Uninstalling IM Scoring Before you uninstall any Intelligent Miner components on Sun Solaris systems, you must disable the databases and the DB2 instance. For information on how to do this, see: “Disabling databases” on page 42 “The idmdisabledb command” on page 132 “Disabling the DB2 instance on UNIX systems” on page 157 You can uninstall IM Scoring, IMiner Conversion Utilities, or both. To uninstall IM Scoring on a Sun Solaris system: 1. Log on as user root. 2. Mount the product CD to a directory of your choice, for example, /cdrom. 3. Go to the directory where you mounted the CD. 4. Go to the Sun Solaris directory by using the following command: cd SUN 5. Uninstall the components you want by executing the appropriate script, as follows: IM Scoring ./sunUninstallScD2 IMiner Conversion Utilities ./sunUninstallCn Documentation By default, the US English documentation is removed together with IM Scoring. To see if any other documentation is installed, execute the following command: pkginfo | grep IMXSDD2 152 Administration and Programming for DB2
  • 168. To remove any other documentation that was installed, execute the following command: pkgrm -n IMXSDD2XX where XX is one of the following: v EE for Spanish v JJ for Japanese v KK for Korean v ZC for Chinese (China) v ZT for Chinese (Taiwan) The uninstall scripts will try to remove all the components that were installed with the relevant component. However, other products may depend on some of these components. In that case, uninstallation of the relevant components will be skipped and the screen output will reflect this. Installing IM Scoring on Windows systems Before you install IM Scoring on a Windows system, ensure that your system meets the prerequisites. Prerequisites for Windows systems The prerequisites that your Windows system must meet for installing IM Scoring are as follows: Disk space v To install IM Scoring or IM Scoring Java Beans: 40 MB of additional disk space v To install IM Scoring, IM Scoring Java Beans, and IM Modeling: 50 MB RAM At least 256 MB RAM Operating system Windows NT 4.0 SP6a Windows 2000 SP2 or higher Windows XP JDK For IM Scoring Java Beans: JDK 1.3 or higher Database DB2 UDB Version 7.2 Fixpack 7, or DB2 UDB Version 8 To download DB2 fixpacks, go to: http://www.ibm.com/software/data/db2/udb/support.html Appendix A. Installing IM Scoring 153
  • 169. If you do not have a DB2 database, you can use the IM Scoring Java Beans installation. Installing IM Scoring Note: IM Scoring for Windows includes Microsoft® Windows Installer (MSWI). If your operating system does not yet have MSWI installed or uses an older version, the automatic installation process installs MSWI as the first step. This requires you to restart Windows. IM Scoring V8 cannot coexist with IM Scoring V7 on the same machine. When you install IM Scoring V8, IM Scoring V7 is automatically uninstalled. To install IM Scoring on a Windows system: 1. Change to the WIN32 directory on the IM Scoring CD-ROM. 2. Run the setup.exe file. 3. Follow the installation instructions. IM Scoring includes a multilingual globalized installation. You can choose the language you want to use during the installation. Regardless of the language you select, IM Scoring is installed with support and translated messages for all supported languages. You can install the following features: Scoring: User-defined functions for DB2 This feature contains the shared libraries that implement the DB2 user-defined functions for the DB2 database server. It also contains a command line interface to convert IM for Data Version 6 result files to the standardized PMML format used by IM Scoring. Install this feature on the database server where the database is located on which you want to run the scoring functions. Scoring: Samples This feature contains data to populate a sample database as well as a sample Clustering model to apply scoring functions to a database. It also contains other samples that show how to migrate IM Scoring V7 databases. PMML conversion utilities (server) Install this feature on the IM for Data Version 6 server. With this feature installed on the server and the PMML conversion utilities (client) feature installed on the client, PMML conversion is possible. You can convert IM for Data Version 6 results to the standardized PMML format from the IM for Data Version 6 GUI. This feature is also available for the IM Scoring Java Beans installation. 154 Administration and Programming for DB2
  • 170. PMML conversion utilities (client) Install this feature on the IM for Data Version 6 client. It adds entries to the client tool registration file idmcsctr.dat to invoke PMML conversion on version 6 result files. The actual conversion runs on the IM for Data server. This feature is also available for IM Scoring Java Beans installation. IM Scoring Java Beans This feature contains IM Scoring Java Beans, which enables you to score a single data record in a Java application given a PMML model. This can be used to integrate scoring in e-business applications, for example, for real-time scoring in customer relationship management (CRM) systems. This feature is also available for the IM Scoring Java Beans installation. Java Beans samples This feature contains Java sample programs for IM Scoring Java Beans. To use the sample, you need a Java Development Kit (JDK). This feature is also available for the IM Scoring Java Beans installation. Online documentation: Scoring This feature contains the online manuals as PDF and HTML files. In the subfeatures, you can select one of the languages that are available. This feature is also available for the IM Scoring Java Beans installation. The default installation path is <Program Files>IBMIMinerX. You can change that path if this is the first product you are installing from the Intelligent Miner extender product family. Otherwise, the path of the product that is already installed is used. Depending on the features you selected on the feature selection dialog, the following installations and updates are performed: v Scoring functions are written to the directory <install path>bin, where <install path> is the directory where IM Scoring is installed. By default, the scoring functions are enabled if DB2 UDB is found during installation. v Sample scripts for the tutorial are written to the following directories: IM Scoring <install path>samplesScoringDB2 Appendix A. Installing IM Scoring 155
  • 171. IM Scoring Java Beans <install path>samplesScoringBean v The PMML model conversion facility is installed. If IM for Data is found on the computer, the model conversion facility is copied into the bin directory of IM for Data. The contents of the idmcsctr.add file and the idmcsstr.add file are added to the appropriate client tool registration files idmcsctr.dat of IM for Data. This adds the option of exporting models from the IM for Data GUI in PMML or XML format. v If the Java Bean feature was selected: – The Java archives (jar) are written to the directory <install path>java – The Java Native Interface (JNI) DLLs are written to the following directory: - <install path>bin where <install path> is the directory where IM Scoring is installed. v The DB2 Intelligent Miner Scoring folder is created, including shortcuts to the sample scripts and online documentation. After IM Scoring is installed, you must reboot your system to activate the PATH variable in the system environment. Before you can use IM Scoring, you must enable the DB2 instance and databases, and create sample tables. For information on how to do this, see: “Enabling the DB2 instance on Windows systems” on page 158 “Configuring the database environment” on page 21 “Enabling databases” on page 41 “The idmenabledb command” on page 133 Before you can use the conversion utility, IM for Data must be enabled to export mining models properly. See “Enabling IM for Data to export PMML or XML models” on page 158. Uninstalling IM Scoring Before you uninstall IM Scoring on Windows systems, you must disable the databases. For instructions on how to do this, see: v “Disabling databases” on page 42 v “The idmdisabledb command” on page 132 To uninstall IM Scoring on a Windows system: 1. Double-click Add/Remove Programs on the Control Panel. 156 Administration and Programming for DB2
  • 172. 2. Select IBM DB2 Intelligent Miner Scoring V8.1 from the list and click Add/Remove.... 3. Follow the instructions on the screen. The following options are available: v Modify the list of installed features v Repair the installation v Remove all features Configuring the database management system on UNIX systems Before you can use IM Scoring, you must prepare your system environment and verify the installation. Enabling the DB2 instance on UNIX systems A DB2 instance must have access to the libraries containing the scoring functions in order to use IM Scoring from that instance. 1. Log on to the DB2 server as the DB2 instance owner and go to the appropriate directory: AIX /usr/lpp/IMinerX/bin Linux, Sun Solaris /opt/IMinerX/bin 2. Call the script idminstfunc. Executing the script creates symbolic links to the following libraries in the instance owner’s sqllib/function directory: v idmclu v idmreg v idmclf v idmrec v idmrul v idmx 3. Change the database manager configuration parameter UDF_MEM_SZ to the maximum value by using the following command: db2 update dbm cfg using UDF_MEM_SZ 60000 4. Restart the DB2 instance by using the db2stop and db2start commands. Disabling the DB2 instance on UNIX systems To disable the DB2 instance: 1. Log on as DB2 instance user. 2. Go to the appropriate directory: AIX /usr/lpp/IMinerX/bin Appendix A. Installing IM Scoring 157
  • 173. Linux, Sun Solaris /opt/IMinerX/bin 3. Call the script idmuninstfunc. This script deletes a set of symbolic links from the instance owner’s sqllib/function directory. Configuring the database management system on Windows systems Before you can use IM Scoring, you must prepare your system environment and verify the installation. Enabling the DB2 instance on Windows systems DB2 allocates a storage area for the input and output parameters of the UDFs. You can modify the size of the storage area by using the database manager configuration parameter UDF_MEM_SZ. This parameter indicates the size of the memory as a number of database pages. The DB2 registry variable DB2NTMEMSIZE indicates the upper limit for fenced UDFs in bytes. The default value is 16 MB. For IM Scoring, the values of the UDF_MEM_SZ and the DB2NTMEMSIZE variables must be increased. Note that the storage allocated by the value specified in the UDF_MEM_SZ variable must not be greater than the upper limit specified in the DB2NTMEMSIZE variable. Use the following commands to increase the value of the UDF_MEM_SZ variable to 60000 pages and the value of the DB2NTMEMSIZE variable to 240MB: db2 update dbm cfg using UDF_MEM_SZ 60000 db2set DB2NTMEMSIZE=APLD:240000000 Restart the DB2 instance. Enabling IM for Data to export PMML or XML models Depending on the system you use, you must complete a number of steps before IM for Data can export PMML or XML models. On AIX systems If IM for Data is found on your system during the IM Scoring installation, the file set IMinerX.symblnk generates links to the model conversion facility in the bin directory of IM for Data. If you install IM for Data after you have installed IM Scoring, you can establish the links to the model conversion facility manually by installing the file set IMinerX.symblnk. 158 Administration and Programming for DB2
  • 174. After you have registered the model conversion facility by using the client tool registration, you can use it on the IM for Data GUI. To register the model conversion facility: v Add the contents of the file idmcsctr.add to the idmcsctr.dat file of the IM for Data client v Add the contents of the file idmcsstr.add to the idmcsstr.dat file of the IM for Data server The files idmcsctr.add and idmcsstr.add are platform-independent. You can add the contents of the file idmcsctr.add to the idmcsctr.dat file of an IM for Data client on different platforms. The situation might occur where the platform is AIX and you are running the IM for Data client in a language other than English. In this case, the idmcsstr.dat file resides in the nls/<language> directory of the IM for Data client. Otherwise, it resides in the bin directory of the IM for Data client. On Sun Solaris systems If IM for Data is found on your system during the installation of IM Scoring, links to the model conversion facility are established in the bin directory of IM for Data. If you install IM for Data after you have installed IM Scoring, you can establish the links to the model conversion facility by calling the script idmlnconv as user root. This script is available in the /opt/IMinerX/bin directory of IM Scoring. After you have registered the model conversion facility by using the client tool registration, you can use it on the IM for Data GUI. To register the model conversion facility: v Add the contents of the file idmcsctr.add to the idmcsctr.dat file of the IM for Data client v Add the contents of the file idmcsstr.add to the idmcsstr.dat file of the IM for Data server The files idmcsctr.add and idmcsstr.add are platform-independent. You can add the contents of file idmcsctr.add to the idmcsctr.dat file of an IM for Data client on different platforms. The situation might occur where the platform is Sun Solaris and you are running the IM for Data client in a language other than English. In this case, the idmcsstr.dat file resides in the nls/<language> directory of the IM for Data client. Otherwise it resides in the bin directory of the IM for Data client. Appendix A. Installing IM Scoring 159
  • 175. If you want to remove IM for Data from your system, you can remove the links to the model conversion facility by calling the script idmrlnconv as user root. On Windows systems If IM for Data is found on the system during the IM Scoring installation, Intelligent Miner is enabled to use the model conversion facility from the Intelligent Miner GUI. If you install IM for Data after you have installed IM Scoring and you want to use the model conversion facility on the IM for Data GUI, complete the following modifications: 1. Invoke the IM Scoring setup.exe 2. Select Modify 3. Add the following features to your installation: v PMML conversion (server) v PMML conversion (client) Alternatively, you can do the following: 1. Add the contents of the file idmcsctr.add to the idmcsctr.dat file of the IM for Data client 2. Add the contents of the file idmcsstr.add to the idmcsstr.dat file of the IM for Data server The files idmcsctr.add and idmcsstr.add files are platform-independent. You can add the contents of idmcsctr.add to the idmcsctr.dat file of an IM for Data client on different systems. 3. Copy the following executables to the bin directory of IM for Data: v idmxdclu.exe v idmxncla.exe v idmxnclu.exe v idmxnpre.exe v idmxrbf.exe v idmxrul.exe v idmxsreg.exe v idmxtree.exe 160 Administration and Programming for DB2
  • 176. Appendix B. Installing IM Scoring Java Beans The WIN32 directory on the installation CD contains a separate setup.exe for IM Scoring Java Beans. This installation enables you to install a subset of IM Scoring without having a DB2 database as a prerequisite. The IM Scoring Java Beans feature is also part of the IM Scoring installation. However, the IM Scoring full installation requires a DB2 database as a prerequisite. For a description of the IM Scoring installation process and the features that are available, see Appendix A, “Installing IM Scoring” on page 145. Installing IM Scoring Java Beans on AIX systems Before you install IM Scoring Java Beans on an AIX system, ensure that your system meets the prerequisites. Prerequisites for AIX systems The prerequisites that your AIX system must meet for installing IM Scoring Java Beans are as follows: v 30 MB of additional disk space in the /usr file system v At least 256 MB RAM v AIX 4.3.3 or higher v JDK 1.3.1 or higher Installing IM Scoring Java Beans Use smit or smitty to install IM Scoring Java Beans on AIX systems. To install IM Scoring on an AIX system: 1. Log on as user root. 2. Insert the IM Scoring CD-ROM in the CD-ROM drive. 3. Type smit on the AIX command line. 4. Select the following options. These steps might differ slightly depending on the AIX version installed on your system. a. Software Installation and Maintenance b. Install and Update Software c. Install and Update from LATEST Available Software d. Enter /dev/cd0 as INPUT device e. Display the list of licensed software available to install. The relevant items in the list are: © Copyright IBM Corp. 2001, 2002 161
  • 177. IMinerX.scoring.scoringbean (Intelligent Miner ScoringBean) Contains the IM Scoring Java Beans functionality of IM Scoring IMinerX.scoringbean.doc.en Contains the documentation for IM Scoring Java Beans functionality of IM Scoring in English in PDF and HTML format To install IM Scoring Java Beans, select the file set IMinerX.scoring.scoringbean To install the documentation in English, select IMinerX.scoringbean.doc.en On the smit installation menu, for AUTOMATICALLY install requisite software, type ’yes’. After you have successfully installed IMinerX.scoring.scoringbean, you might need to install the appropriate license. To do this, you must repeat the installation procedure. IM Scoring and IM Scoring Java Beans share the same license. You can omit this procedure if you have already installed a license for IM Scoring. To install the IM Scoring ’Try and Buy’ license: In step 4e, select the file set IMinerX.scoring.tab.license. The ’Try and Buy’ license expires after 59 days. To install the regular IM Scoring license: In step 4e, select the file set IMinerX.scoring.license. Uninstalling IM Scoring Java Beans To uninstall IM Scoring Java Beans: 1. Log on as user root. 2. Run the standard software and maintenance procedure from smit or smitty and select the following options: a. Software Installation and Maintenance b. Software Maintenance and Utilities c. Remove Installed Software 3. List the installed software. 4. Follow the instructions on the smit menus to uninstall IM Scoring Java Beans. Installing IM Scoring Java Beans on Linux systems Before you install IM Scoring Java Beans on a Linux system, ensure that your system meets the prerequisites. 162 Administration and Programming for DB2
  • 178. Prerequisites for Linux systems The prerequisites that your Linux system must meet for installing IM Scoring Java Beans are as follows: v At least 256 MB RAM v Linux kernel 2.2.12 or higher v JDK 1.3.1 or higher Installing IM Scoring Java Beans Before you can install IM Scoring Java Beans on Linux systems, IM Scoring must already be installed. To install IM Scoring Java Beans: 1. Insert the Linux server CD-ROM in the CD-ROM drive. 2. Mount the CD-ROM by using the following command: mount /cdrom If the directory /cdrom is not listed in the file /etc/fstab, use the following command: mount -tauto9660 /<dev/hdx> /cdrom where /dev/hdx is your CD-ROM drive. 3. Depending on your system, go to the appropriate directory by using one of the following commands: On Linux/i386: cd /cdrom/LINUX/i386 On Linux/s390: cd LINUX/s390 4. Depending on your system, use the appropriate commands to complete the installation. On Linux/i386: ./linuxInstallScoringBean On Linux/s390: ./linux390InstallScoringBean Uninstalling IM Scoring Java Beans To uninstall IM Scoring Java Beans on Linux systems, use the following commands: On Linux/i386: ./linuxUninstallScoringBean Appendix B. Installing IM Scoring Java Beans 163
  • 179. On Linux/s390: ./linux390UninstallScoringBean Installing IM Scoring Java Beans on Sun Solaris systems Before you install IM Scoring Java Beans on a Sun Solaris system, ensure that your system meets the prerequisites. Prerequisites for Sun Solaris systems The prerequisites that your Sun Solaris system must meet for installing IM Scoring Java Beans are as follows: v At least 256 MB RAM v Sun Solaris 2.6 or higher v JDK 1.3.1 or higher Make sure that the following patches are installed: 109147-09 108434-02 108435-02 To download patches, go to http://sunsolve.sun.com. Installing IM Scoring Java Beans To install IM Scoring Java Beans on a Sun Solaris system: 1. Log on as user root. 2. Mount the product CD to a directory of your choice, for example, /cdrom. 3. Go to the directory where you mounted the CD. 4. Go to the Sun Solaris directory by using the following command: cd SUN 5. Install IM Scoring Java Beans by executing the following command: ./sunInstallScJB Uninstalling IM Scoring Java Beans To uninstall IM Scoring Java Beans on a Sun Solaris system: 1. Log on as user root. 2. Mount the product CD to a directory of your choice, for example, /cdrom. 3. Go to the directory where you mounted the CD. 4. Go to the Sun Solaris directory by using the following command: cd SUN 5. Uninstall IM Scoring Java Beans by executing the following command: ./sunUninstallScJB 164 Administration and Programming for DB2
  • 180. Installing IM Scoring Java Beans on Windows systems Before you install IM Scoring Java Beans on a Windows system, ensure that your system meets the prerequisites. Prerequisites for Windows systems The prerequisites that your Windows system must meet for installing IM Scoring Java Beans are as follows: v 30 MB of additional disk space in the /usr file system v At least 256 MB RAM v Windows NT SP 6a, Windows 2000 SP2, or Windows XP v JDK 1.3.1 or higher Installing IM Scoring Java Beans The installation of IM Scoring Java Beans on a Windows system is an installation feature of the DB2 IM Scoring package. Appendix B. Installing IM Scoring Java Beans 165
  • 181. 166 Administration and Programming for DB2
  • 182. Appendix C. Migration from IM Scoring V7.1 If you want to migrate from IM Scoring V7.1 to IM Scoring V8.1, in general you can start working with IM Scoring V8.1. after performing the following steps: 1. Installing IM Scoring V8.1 2. Performing the configuration steps 3. Re-enabling your databases or enabling new databases Note that, when you re-enable your databases, you do not need to have already disabled any databases. However, due to new features and also to limitations introduced with IM Scoring V8.1, there are a number of issues that you must keep in mind. These issues are described in the sections that follow. They include: v “Working with IM Scoring V7.1 and V8.1 in parallel” v “Exporting and importing models with the use of compression” on page 168 v “Exporting and importing models by means of DB2 Utilities” on page 168 v “Importing models in unfenced mode” on page 169 v “Applying Neural models” on page 169 v “Using the function DM_getClusterID” on page 170 Working with IM Scoring V7.1 and V8.1 in parallel Windows platform: If you install IM Scoring V8.1 on a Windows machine and the install finds IM Scoring V7.1, IM Scoring V7.1 is automatically uninstalled. This means that you cannot work with both versions on one machine on the Windows platform. UNIX platforms: IM Scoring V8.1 installs into a directory that is different from the one used for IM Scoring V7.1. This means that you can have both versions installed on one machine. However, you are recommended to migrate to IM Scoring V8.1, because there is a high risk of user error if you use both versions. If you want to perform scoring operations with both versions, the recommended way to do this is to use different DB2 instances. The reason for this is that an instance is enabled for a specific version of IM Scoring by means of the command idminstfunc. If you want to use the same DB2 instance for both products, disable and enable the DB2 instance first © Copyright IBM Corp. 2001, 2002 167
  • 183. before you use the second product. For this scenario, you are recommended to use different databases. Exporting and importing models with the use of compression IM Scoring V8.1 stores models in a compressed format in a database. You might have a database that was enabled for IM Scoring V7.1 and you might have re-enabled it for IM Scoring V8.1. In this case, any models of types DM_ClasModel, DM_ClusteringModel, and DM_RegressionModel are still in an uncompressed format. IM Scoring V8.1 can also work with the uncompressed format. However, if you want to save disk space, you can simply compress the models by using the export and import UDFs that are provided. These UDFs are: DM_expClasModel DM_impClasModel DM_expClusModel DM_impClusModel DM_expRegModel DM_impRegModel You can also compress the models by using a sample script, which is available in the directory samples/ScoringDB2. You can directly use the sample if your models are stored in the tables that are provided. These tables are as follows: v IDMMX.ClusterModel v IDMMX.ClassifModels v IDMMX.Regressionmodels If you use tables that are different from these, you must adapt the sample to your needs before you execute it. To execute the sample: 1. Connect to the database. 2. Call the following: db2 -tf compressV7Models.db2 Exporting and importing models by means of DB2 Utilities IM Scoring V8.1 has a model compression feature. For this reason, the default size for the following model types is 50 MB when you enable a database in unfenced mode: v DM_ClasModel v DM_ClusteringModel v DM_RegressionModel In IM Scoring V7.1 the default size was 100 MB for the unfenced mode. 168 Administration and Programming for DB2
  • 184. The situation might arise where you re-enable, for the use of IM Scoring V8.1, a database that was enabled for IM Scoring V7.1. In this case, if you have models stored in the database, the model types are not recreated. If you get run-time problems because you are still working with model types whose size is 100 MB, do the following: 1. Export the models by means of the DB2 export command. 2. Disable your database. 3. Enable your database again. 4. Import all the models by means of the DB2 import command. In the samples/ScoringDB2 directory of your installation, sample scripts are available that export and import models by means of the appropriate DB2 commands. You can directly use the samples if your models are stored in the tables that are provided; these are IDMMX.ClusterModel, IDMMX.ClassifModels, and IDMMX.Regressionmodels. If you use tables that are different from these, you must adapt the samples to your needs before you execute them. To execute the samples: 1. Connect to the database. 2. Call the following: db2 -tf db2ExportModels.db2 idmdisabledb <databasename> [tables] idmenabledb <databasename> [tables] [fenced|unfenced] db2 -tf db2ImportModels.db2 Importing models in unfenced mode The situation might occur where you have enabled your database for the unfenced mode and you want to import models that were created by IM for Data. In this case, the models cannot be imported if they are still in Intelligent Miner format. Convert the models to PMML first before you import them. Use one of the following ways to do this: v Export them directly in PMML format from IM for Data or v Use the command line conversion tool idmxmod to convert a model from IM format to PMML format. Applying Neural models Neural Clustering models, Neural Classification models, and Neural Regression models that were imported by IM Scoring V7.1 can no longer be applied by IM Scoring V8.1. If the models are still available to you as flat files or as results in IM for Data, drop them from your tables, and import them again. Appendix C. Migration from IM Scoring V7.1 169
  • 185. Using the function DM_getClusterID The situation might arise where you use the function DM_getClusterID on a DM_ClusResult value that was returned when a Clustering model was applied. In this case, the values will differ from those returned by IM Scoring V7.1. In IM Scoring V7.1, the cluster names were returned as cluster IDs. In IM Scoring V8.1, the cluster ID that is returned is the position of the cluster in the PMML model that is applied. To get the cluster name, call the function DM_getClusterName with the cluster ID (returned by DM_getClusterID) as input parameter. Note that the cluster names and cluster IDs are shown in the IM Visualizer. 170 Administration and Programming for DB2
  • 186. Appendix D. Coexistence with IM Modeling On the Windows operating system, IM Modeling V8.1 can coexist only with IM Scoring V8.1. If you have IM Scoring V7.1 installed, the IM Modeling V8.1 installation removes the IM Scoring V7.1 product. Shared schema The schema IDMMX is shared between IM Modeling and IM Scoring. Shared data types The following data types are shared between IM Modeling and IM Scoring: v DM_ClusteringModel v DM_ClasModel v DM_RuleModel (published only for IM Modeling, but shared internally) v DM_LogicalDataSpec Shared functions The following functions are shared between IM Modeling and IM Scoring: DM_ClusteringModel v DM_expClusModel v DM_getNumClusters v DM_getClusterName v DM_getClusMdlName v DM_getClusMdlSpec DM_ClasModel v DM_expClasModel v DM_getClasMdlSpec v DM_getClasTarget v DM_getClasCostRate v DM_getClasMdlName © Copyright IBM Corp. 2001, 2002 171
  • 187. Shared methods The following methods are shared between IM Modeling and IM Scoring: DM_LogicalDataSpec v DM_expDataSpec v DM_getFldName v DM_getFldType v DM_getNumFields v DM_impDataSpec v DM_isCompatible Shared commands The following commands are shared between IM Modeling and IM Scoring: v idmenabledb v idmcheckdb v idmdisabledb v idmlicm v idmlevel v idminstfunc v idmuninstfunc 172 Administration and Programming for DB2
  • 188. Appendix E. Error messages This appendix describes error events that can occur when you use IM Scoring. Examples of error situations are: v A wrong SQL function is used to import or apply a mining model. For example, DM_impClusFile is used instead of DM_impClasFile to import a Classification model. v A mining model or mining results data value is inserted into a database column that is configured for the wrong data type. For example, Clustering results are inserted into a column that is configured for data type DM_ClasResult instead of data type DM_ClusResult. v A wrong SQL function is used to get results data. For example, the DM_getClusterID function is used instead of the DM_getConfidence function on Classification mining results data. The following types of errors are generated by IM Scoring: SQL states Identified by a five-digit error event code IM Scoring error events Identified by a four-digit error event code Tip: If a reason code is not documented: 1. Check that there is enough disk space. 2. Collect all the error information that is available. 3. Call your IBM service representative. DB2 SQL states 38503 (SQL0430N) The user-defined function 42724(SQL0444N) The routine ″%1″ (specific (UDF) <function name>(<specific name ″%2″) is implemented in the name> <function name>) has library or path ″%3″, function terminated abnormally. ″%4″. The routine ″%1″ cannot be accessed. Reason code: ″%5″ Explanation: DB2 or IM Scoring might not be installed correctly. Explanation: A shared library or DLL required by IM Scoring was not found by the DB2 engine. User Response: Look for any hints in the DB2 DB2 or IM Scoring might not be installed dump file db2diag.log, or contact your IBM correctly. representative. User Response: Check your installation, or © Copyright IBM Corp. 2001, 2002 173
  • 189. contact your IBM representative. db2 update dbm cfg using UDF_MEM_SZ 30000 On Windows platforms, you must set an 57011 (SQL0973N) Not enough storage is additional parameter in the DB2 registry. available in the UDF_MEM heap to process the statement. DB2NTMEMSIZE indicates the upper limit in bytes User Response: DB2 allocates a storage area for for fenced UDFs. If you get the SQLSTATE 57011 the input and output parameters of UDFs. You for fenced UDFs, increase the value for can modify the size of this area by using the UDF_MEM_SZ to 30000 pages and the value for database manager configuration parameter DB2NTMEMSIZE to 120 MB. Use the following UDF_MEM_SZ. This parameter indicates the size of commands to do this: the memory as a number of database pages. db2 update dbm cfg using UDF_MEM_SZ 30000 Use the following command to update the DBM db2set DB2NTMEMSIZE=APLD:120000000 CFG for the database instance, and then restart the DB2 instance: Restart the DB2 instance. IM Scoring SQL states If you get the DB2 error message SQL0443N, it means that one of the following SQL states has occurred: 38M00 This message occurs when an IM Scoring function ended with a kernel error. An accompanying four-digit number identifies an IM Scoring error event. 38M01 This message occurs when an IM Scoring function ended with a non-kernel error. An accompanying four-digit number identifies an IM Scoring error event. 01HM0 This message occurs when an IM Scoring function ended with a warning. An accompanying four-digit number identifies an IM Scoring error event. For a listing of IM Scoring error events, see “IM Scoring error events”. IM Scoring error events When an IM Scoring error event occurs, the SQL message SQL0443N is displayed with one of the following SQL states: v 38M00 for kernel error states v 38M01 for non-kernel error states v 01HM0 for warning states 174 Administration and Programming for DB2
  • 190. These SQL state messages include: v The four-digit reason code that identifies the IM Scoring event v The text of the error message For more information see “IM Scoring SQL states” on page 174. The following is a list of the more important IM Scoring error messages. They appear in the order of their four-digit error code numbers. The messages are accompanied, where relevant, by explanations of their meanings and indications as to what action you should take if they appear. If you need a full list of error messages, you can find one in the idmxall.msg file, which is available in the bin directory of the installation. 1601 The codepage %1 is not 2117 The field type in the result file is supported. not applicable for Field ″%1″. 1602 The initialization of the trace 2118 No field with name ″%1″ exists in facility failed. No trace messages result ″%2″. will be written. 2119 The active field ″%1″ is not active 2038 The Intelligent Miner is unable to in the ’Result Statistics’ that has read the result object ″%1″. been used. User Response: Check if the file exists, and verify that you have read permission on the 2120 Field ″%1″ was used to construct server. the models. For this reason, it must be defined as active field. 2064 The prediction result contains fields with incomplete statistics. 2216 Active field ″%1″ occurs more than once. 2109 Invalid continuous statistics object for field ″%1″. 2496 Error during XML parser initialization: ″%1″. 2110 Invalid discrete statistics object for field ″%1″. 2497 XML parser error in ″%1″. Line: ″%2″, Column: ″%3″, Message: 2111 The discrete statistics object is ″%4″. missing for field ″%2″, which has been used for initializing the descriptive statistics result ″%1″. 2498 The XML element ″%1″ is not unique. Specify a unique name for this element. 2112 The continuous statistics object is missing for field ″%2″, which has been used for initializing the descriptive statistics result ″%1″. Appendix E. Error messages 175
  • 191. 2500 An XML syntax error occurred at 2504 The DM_ApplicationData record the position ″%1″ in the input ″%1″ contains too many fields. record ″%2″. Explanation: The DM_ApplicationData record Explanation: The input format for the might contain active fields that were written construction DM_ApplicationData is not valid. twice. Only one value is used. One of the column values might contain invalid User Response: Remove the redundant fields to XML characters such as < or &. For example, for improve the performance. <, you must use &lt;, or for &, you must use &amp;. 2505 The attribute ″%1″ is not valid for User Response: Check the input record against the XML input model ″%2″. the XML DTD of IM Scoring. Replace the invalid characters with the appropriate coding. Explanation: The attribute kind=″%1″ of the element ComparisonMeasure is ignored, because it is not valid for the XML input model. The 2501 The value ″%1″ of the field ″%2″ attribute kind=″%3″ is used. in the XML input record cannot be converted to a numeric value. User Response: To prevent further warning messages, specify attribute kind=″%3″ in your User Response: Check if the value is a number input model. and if the decimal separator is compatible with the language settings of the database. 2506 The comparison measure ″%1″ is not valid for the type of the XML 2502 The DM_ApplicationData record input model. contains the field ″%1″. This field does not exist in the mining Explanation: The comparison measure ″%1″ is model. ignored, because the XML input model contains a comparison measure that is not valid for the Explanation: The field ″%1″ is not an active model type. The comparison measure ″%2″ is field in the mining model. The spelling of the used. field name might be wrong. Note that uppercase and lowercase characters are treated as different User Response: To prevent further warning characters. messages, specify the comparison measure ″%2″ in your input model. User Response: Remove all fields that are not active in the mining model to improve the performance. 2507 The compare function ″%1″ is not valid for the field ″%2″. 2503 The fields in the Explanation: The compare function ″%1″ DM_ApplicationData record ″%1″ specified for the field ″%2″ is ignored, because it are insufficient. is not valid for the field type. The compare function ″%3″ is used. Explanation: The DM_ApplicationData record does not contain as many fields as defined in the User Response: To prevent further warning mining model. messages, specify the compare function ″%3″ in your input model. User Response: Provide values for all active fields in the mining model. 2508 The field type ″%1″ is not supported. 176 Administration and Programming for DB2
  • 192. 2509 The closure ″%1″ is not 2538 The statistics of the field ″%1″ are supported. Closure ″%2″ is used inconsistent. instead. Explanation: The two arrays of values and frequencies have different lengths. 2530 The PMML model contains ″%2″. It must contain element ″%1″. 2539 The interval between ″%1″ and ″%2″ for the field ″%3″ is not 2531 The PMML model does not valid. contain the mandatory attribute ″%1″ in element ″%2 %3″. 2540 There is a gap between ″%1″ and ″%2″ in the intervals of the field 2532 The field ″%1″ does not have ″%3″. valid values. 2541 The number of statistics ″%1″ in 2533 A restrained validity domain is ModelStats exceeds the number of defined for the continuous field fields in the data dictionary. ″%1″ in the PMML model. Explanation: This validity domain is ignored. 2542 The compare function is not All values are valid for the continuous field supported for the element ″%1″. ComparisonMeasure. Explanation: The default compare function ″%1″ 2534 The field ″%1″ is used in the is ignored. model but is not declared in the User Response: Write the compare function data dictionary. ″%1″ in every Clustering field that uses it. 2535 All continuous fields must have 2543 The similarity matrix for the field the same outlier treatment. ″%1″ is not valid and will not be Explanation: The field ″%1″ is indicated with used. the outlier treatment ″%2″ and the current field with the outlier treatment ″%3″. The outlier 2549 The field ″%1″ has a taxonomy treatment ″%3″ will be used for all of the fields. ″%2″. Explanation: Taxonomies are not supported. 2536 The outlier treatment ″%1″ is not The field ″%1″ will be used without a taxonomy. defined in PMML. Explanation: The outlier treatment ″%1″ cannot 2550 The PMML model contains a be written in the PMML model. asIs is used non-empty instead. This causes differences between the TransformationDirectory element. Intelligent Miner model and the PMML model. Explanation: Computed fields are not supported and cannot be ignored. Therefore, it is 2537 The value ″%1″ is not valid for not possible to use this model in Intelligent the field ″%2″. Miner. Appendix E. Error messages 177
  • 193. 2600 The name mapping ″%1″ is not 2608 There are two matrices with the complete. name ″%1″. User Response: Verify that you have specified a Explanation: You must specify unique names name mapping name, a table name, and two for matrices. column names. User Response: Remove or rename any matrices that have duplicate names. 2601 The name mapping ″%1″ does not point to a valid data source. 2609 The list of ″%1″ values does not Explanation: The table ″%2″ or the columns match the number of rows ″%2″ ″%3″ and ″%4″ that are defined in the name or the number of columns ″%3″ in mapping ″%1″ are not accessible. the matrix ″%4″. User Response: Make sure that the data source Explanation: Each row or column in the matrix exists and that it can be read. must match a value in the list of values. User Response: Make sure that the matrix is 2602 Name mappings with the name square and that there are as many values as the ″%1″ exist already. size of the matrix. Explanation: You must specify unique names for name mappings. 2610 The XML parameters do not contain a valid task element. User Response: Remove or rename any name mappings that have duplicate names. 2611 The XML parameters do not contain the mining data element 2605 The matrix ″%1″ is not complete. ″%1″. User Response: Verify that you have specified a matrix name, a table name, and three column 2612 The XML parameters do not names. contain a logical data specification. 2606 The matrix ″%1″ is not complete. Explanation: Parts of the matrix ″%1″ do not 2613 The XML parameters do not have the correct format. contain settings. User Response: Use the standard SQL functions to build the matrix. 2614 The XML parameters do not contain clustering settings. 2607 The matrix ″%1″ does not contain a valid data source. 2615 The XML parameters do not contain classification settings. Explanation: The table ″%2″ or the columns ″%3″, ″%4″, and ″%5″ that are defined in the matrix ″%1″ are not accessible. 2616 The XML parameters do not contain association rules settings. User Response: Make sure that the data source exists and that it can be read. 178 Administration and Programming for DB2
  • 194. 2620 The mining data is not completely 2625 The logical data specification is defined. not completely defined. Explanation: Some of the attributes or Explanation: Some of the attributes or subelements that define a mining data value are subelements that define the logical data not present. specification are not present. User Response: Verify that you have specified a User Response: Verify that you have specified a table name and a list of column names and non-empty list of field names and field types. aliases. 2626 The logical data specification 2621 The mining data does not contains a field with an empty correspond to a valid data source. name. Explanation: It is not possible to access the Explanation: Fields containing an empty or a table ″%1″ or the columns whose aliases match blank name are not allowed. the field names in the logical data specification. User Response: Change the name of this field. User Response: Make sure that the data source exists and that it can be read. 2627 The type ″%1″ of the field ″%2″ is not defined. 2622 The mining data contains two Explanation: Only the types ’categorical’ and columns with the same name, ’numerical’ are supported. ″%1″. User Response: Choose either the categorical or Explanation: Columns must have unique numerical type for the field ″%2″. names. User Response: Remove or rename one of these 2628 There is no match between the columns. field name ″%1″ and the alias of a column in the mining data. 2623 The mining data contains two Explanation: The field name must match the columns with the same alias, alias of a column in the mining data. ″%1″. User Response: Make sure that all the field Explanation: Columns must have unique names match column aliases in the mining data. aliases. User Response: Remove one of these columns, 2629 There is no name mapping ″%1″ or change its alias. for the field ″%2″. User Response: Remove the reference to name 2624 The mining data contains a mapping ″%1″ in the field ″%2″, or add a name column ″%1″ with an empty alias. mapping ″%1″. Explanation: Columns using an empty or a blank alias are not allowed. 2630 The numeric field ″%1″ has only User Response: Change the alias of this one limit. column. Explanation: For numeric fields, a non-outlier range can be specified by giving a lower and an upper boundary. Because only one of these limits has been specified, the limit will be ignored. Appendix E. Error messages 179
  • 195. User Response: Either specify no limits, or 2640 The value for the minimum specify the lower and upper boundaries. percentage of data, ″%1″, is not 100. 2635 The value ″%2″ of power option Explanation: When no limit is set for the ″%1″ is not valid. execution time, all the data will be read. The User Response: Do not specify this power value that was specified for the minimum option, or specify a valid value for it (not percentage of data will be ignored. documented). User Response: To avoid getting this warning message, remove the value for the minimum 2636 The field ″%1″ referenced in the percentage of data, which defaults to 100, or set settings is not known. it explicitly to 100. Explanation: This field either has no name or is not present in the logical data specification. 2641 The value for the minimum percentage of data, ″%1″, is not User Response: Remove the reference to this valid. field, or use a valid non-empty name for it. Explanation: The value must be between 0 and 100. 2637 The outlier treatment ″%1″ for the field ″%2″ is invalid. User Response: Remove the value for the minimum percentage of data, or set it to a value Explanation: The only valid outlier treatments between 0 and 100. are asIs, asMissing, and asExtreme. User Response: Remove or change the outlier 2642 A field weight is defined for the treatment for field ″%2″. supplementary field ″%1″. Explanation: Field weights apply only to active 2638 An outlier treatment is defined fields. The field weight for the field ″%1″ will be for the categorical field ″%1″. ignored. Explanation: Outlier treatments apply only to User Response: To avoid getting this warning numerical fields. The outlier treatment for field message, remove the field weight for field ″%1″. ″%1″ will be ignored. User Response: To avoid getting this warning 2643 An outlier treatment is defined message, remove the outlier treatment for field for the supplementary field ″%1″. ″%1″. Explanation: Outlier treatments apply only to active fields. The outlier treatment for field ″%1″ 2639 The value for the desired will be ignored. execution time, ″%1″, is invalid. User Response: To avoid getting this warning Explanation: The desired execution time must message, remove the outlier treatment for field be greater than or equal to 0, zero meaning no ″%1″. time limitation. User Response: Remove or change the value of 2645 The field usage type ″%1″ is not the desired execution time. supported in clustering settings. Explanation: Only active and supplementary fields are supported for Clustering. 180 Administration and Programming for DB2
  • 196. User Response: Change the usage type of the 2653 A value for similarity scale is field ″%2″. defined for the supplementary field ″%1″. 2646 The field usage type ″%1″ is not Explanation: Similarity scales apply only to supported in classification active fields. The similarity scale value for field settings. ″%1″ will be ignored. Explanation: Only active and target fields are User Response: To avoid getting this warning supported for Classification. message, remove the similarity scale value for User Response: Change the usage type of the field ″%1″. field ″%2″. 2654 A similarity matrix is defined for 2647 The field usage type ″%1″ is not the numerical field ″%1″. supported in association rules Explanation: Similarity matrices apply only to settings. categorical fields. The similarity matrix for field Explanation: Only group and item fields are ″%1″ will be ignored. supported for association rules. User Response: To avoid getting this warning User Response: Change the usage type of the message, remove the similarity matrix for field field ″%2″. ″%1″. 2650 The value for the maximum 2655 A similarity matrix is defined for number of clusters, ″%1″, is the supplementary field ″%1″. invalid. Explanation: Similarity matrices apply only to Explanation: The value for the maximum active fields. The similarity matrix for field ″%1″ number of clusters must be greater than or equal will be ignored. to 0, zero meaning no upper limit. User Response: To avoid getting this warning User Response: Remove or change the value for message, remove the similarity matrix for field the maximum number of clusters. ″%1″. 2651 There is no similarity matrix ″%1″ 2656 The value weighting ″%1″ for the for the field ″%2″. field ″%2″ is invalid. User Response: Remove the reference to matrix Explanation: The only valid value weightings ″%1″ in field ″%2″, or add a matrix ″%1″. are info, prob, compInfo, and compProb. User Response: Remove or change the value 2652 A value for similarity scale is weighting for the field ″%2″. defined for the categorical field ″%1″. 2657 A value weighting is defined for Explanation: Similarity scales apply only to the supplementary field ″%1″. numerical fields. The similarity scale value for Explanation: Value weightings apply only to field ″%1″ will be ignored. active fields. The value weighting for field ″%1″ User Response: To avoid getting this warning will be ignored. message, remove the similarity scale value for User Response: To avoid getting this warning field ″%1″. Appendix E. Error messages 181
  • 197. message, remove the value weighting for the 2664 The target field ″%1″ is a field ″%1″. numerical field. Explanation: Only categorical fields may be the 2658 The similarity threshold ″%1″ is target field of a Classification algorithm. invalid. User Response: Choose a categorical field as Explanation: The similarity threshold must be the target. between 0 and 1. User Response: Remove or change the value for 2665 No target field is specified in the the similarity threshold. classification task. Explanation: A target field (one only) must be 2660 There is no cost matrix ″%1″. specified. Explanation: The Classification settings value User Response: Specify one categorical field as references a cost matrix ″%1″ that does not exist. the target. User Response: Remove the reference to cost matrix ″%1″, or add a matrix ″%1″. 2666 The field ″%1″ is defined in the input model but not in the test 2661 An input model is specified for task. the training phase. Explanation: The fields in the input model and Explanation: The use of an input model is not in the test task must match. supported during the training phase. Input User Response: Verify that you have specified models are expected only during the test phase. an input model and a test task that are User Response: Do not define an input model compatible. for this task. 2667 Some of the fields have field 2662 No input model is specified for weights. the test phase. Explanation: Field weights are not considered Explanation: The test phase can be processed for classification. The field weights will be only if an input model is given. ignored. User Response: Specify an input model for this User Response: To avoid getting this warning test task. message, remove the field weights from any fields that have them. 2663 More than one target field is specified. 2668 Some of the fields have outlier treatments. Explanation: Only one target field may be specified. Explanation: Outlier treatments are not considered for Classification. The outlier User Response: Specify one of the two fields treatments will be ignored. ″%1″ or ″%2″ as the target field. User Response: To avoid getting this warning message, remove the outlier treatments from any fields that have them. 182 Administration and Programming for DB2
  • 198. defined in the category map ″%1″. 2669 The value for maximum tree depth ″%1″ is invalid. User Response: Make sure that the data source exists and that it can be read. Explanation: The maximum tree depth value must be greater than or equal to 0, zero meaning no upper limit. 2678 There are two category maps with the same name, ″%1″. User Response: Remove or change the value for the maximum tree depth. Explanation: Duplicate names are not allowed for the category maps in a taxonomy. 2670 The value for minimum purity, User Response: Remove or rename one of these ″%1″, is invalid. category maps. Explanation: The minimum purity value must be between 0 and 100. 2679 There is no name mapping ″%1″ for the category map ″%2″. User Response: Remove or change the value for the minimum purity. User Response: Remove the reference to name mapping ″%1″ in the category map ″%2″, or add a name mapping ″%1″. 2671 The value for the minimum number of records per node, ″%1″, is invalid. 2680 There is more than one group field. Explanation: This value must be greater than or equal to 0. Explanation: Only one group field is allowed. User Response: Remove or change the value for User Response: Specify one of the two fields the minimum number of records per node. ″%1″ or ″%2″ as the group field. 2675 The category map ″%1″ is not 2681 The group field ″%1″ is a completely defined. numerical field. Explanation: Some of the attributes that define Explanation: Only a categorical field is allowed a unique category map are not present. to be the group field for an association rules algorithm. User Response: Verify that you have specified a name, a table name, and two column names. User Response: Choose a categorical field as the group field. 2676 The taxonomy ″%1″ is not correctly defined. 2682 No group field is specified in the association rules task. Explanation: The taxonomy either has no name or does not contain a category map. Explanation: A group field (one only) must be specified. User Response: Use the standard SQL functions to build the category map. User Response: Specify one categorical field as the group field. 2677 The category map ″%1″ does not contain a valid data source. Explanation: It is not possible to access the table ″%2″ or the columns ″%3″ and ″%4″ Appendix E. Error messages 183
  • 199. 2683 More than one item field is 2689 The value for minimum specified. confidence, ″%1″, is invalid. Explanation: Only one item field is allowed. Explanation: The minimum confidence value must be between 0 and 100. User Response: Specify one of the two fields ″%1″ or ″%2″ as the item field. User Response: Remove or change the value for minimum confidence. 2684 The item field ″%1″ is a numerical field. 2690 The value for maximum rule length, ″%1″, is invalid. Explanation: The item field for an association rules algorithm must be categorical. Explanation: The maximum rule length value must be greater than or equal to 0, zero meaning User Response: Choose a categorical field as no upper limit. the item field. User Response: Remove or change the value for the maximum rule length. 2685 No item field is specified in the association rules task. 2691 Some of the fields have field Explanation: An item field (one only) must be weights. specified. Explanation: Field weights are not considered User Response: Specify one categorical field as for association rules. The field weights will be the item field. ignored. User Response: To avoid getting this warning 2686 There is no taxonomy ″%1″ for the message, remove the field weights from any item field ″%2″. fields that have them. User Response: Remove the reference to taxonomy ″%1″ in the item field ″%2″, or add a 2692 Some of the fields have outlier taxonomy ″%1″. treatments. Explanation: Outlier treatments are not 2687 The item constraints are not considered for association rules. The outlier correctly defined for the treatments will be ignored. association rules settings. User Response: To avoid getting this warning Explanation: The item constraints values either message, remove the outlier treatments from any specify an unknown type or do not contain any fields that have them. constraints on items. User Response: Use the standard SQL functions 2693 The cost rate ″%1″ in the to build the item constraints values. classification settings is invalid. Explanation: The cost rate must be between 0 2688 The value for minimum support, and 100. ″%1″, is invalid. User Response: Remove or change the value for Explanation: The minimum support value must the cost rate in the Classification settings. be between 0 and 100. User Response: Remove or change the value for minimum support. 184 Administration and Programming for DB2
  • 200. 2700 The categorical field ″%1″ has 2773 An error occurred during the more than ″%2″ values. initializing of License Use Management for the product ″%1″ Explanation: Categorical fields with too many with the license file ″%2″. Check values degrade performance, largely without your installation. improving the mining result. For this reason, the maximum number of values considered is Explanation: An attempt was made to install a limited to ″%2″. The statistics containing temporary license in nodelock files. The information about the first ″%2″ values of the permissions to write to the directory field ″%1″ are stored; the other values are /IMinerX/bin might be missing. considered invalid. User Response: Check your installation. Run User Response: Verify that the field ″%1″ is the idmlicm executable as a user who has write needed in this mining run and that all of its permissions for <INSTALLDIR>/IMinerX/bin (for values are useful. If necessary, preprocess the example, root for UNIX or Administrator for data in order to reduce the number of values or Windows). to have the important values first. 2774 The license has expired. If it was 2701 The field ″%1″ is of the ordinal a ’Try and Buy’ license, you can type. now enroll ″%1″ in the nodelock file ″%2″. Explanation: The ordinal type is not supported. The field ″%1″ will be considered to be of the Explanation: An attempt was probably made to categorical type. use a temporary ’Try and Buy’ license that has now expired. 2771 You are now using a temporary User Response: Buy a production license and license. You need to enroll ″%1″ replace the nodelock files, or uninstall the within ″%2″ days in the license product after the temporary license has expired. file ″%3″. Explanation: You are now using the ’Try and 2775 The product ″%1″ does not have a Buy’ version. The number of days after which license enrolled. A ’Try and Buy’ the temporary key expires is ″%2″. license could not be added. Use the idmlicm command to check User Response: Buy a production license and your license status. Verify that replace the nodelock files, or uninstall the you have installed all the product after the temporary license has expired. necessary components. User Response: On UNIX systems you need to 2772 The number of days left is ″%2″. invoke the idmlicm command as root user to You are still using a temporary enroll a temporary ’Try and Buy’ license. license. You need to enroll ″%1″ Production licenses are a separate installation in the license file ″%3″. option available on the installation media. Explanation: You are using the ’Try and Buy’ version. The number of days after which the 3112 An error occurred when the temporary key expires is ″%2″. clustering model was read. User Response: Buy a production license and replace the nodelock files, or uninstall the product after the temporary license has expired. Appendix E. Error messages 185
  • 201. 3113 The field ″%1″ does not appear in 3148 You are using an XML model that the clustering model. does not contain the attribute similarityScale for numeric fields. User Response: For this reason, the Intelligent Therefore distance units are Miner cannot apply this model to your current calculated by default. data. Check the input fields you specified and the model. Correct any mismatch. If there is no mismatch, contact your IBM representative. 3149 You are converting a Version 6.1 model to XML. This model might contain an erroneous outlier 3135 A faulty record has been found treatment. It also does not contain and skipped. any similarity definitions. Explanation: Version 6.1 results do not contain 3136 The field ″%1″ was not used the outlier treatment and the similarity when the model was built. definitions you might have specified. Outliers are User Response: Remove this field from the treated as missing values. Similarity definitions active fields list. are not used. User Response: Upgrade your Intelligent Miner 3143 You use a Version 1 result for software to the latest version and export this application mode. model again. Explanation: This result contains no distance units. 3203 An error occurred as the Intelligent Miner tried to read the next record (rc=″%1″). 3146 The discrete numeric field ″%1″ has ″%2″ different values. 3205 The field ″%1″ is defined as active User Response: For this reason, the similarity more than once, or as active and matrix for this field needs a lot of space. This supplementary. might cause your system to run out of memory. It is recommended that you define this field as a Explanation: The Intelligent Miner considers continuous field. only the first ″active″ declaration. The other specifications are ignored. 3147 You are converting a Version 1 result to XML. This model does 3206 The field ″%1″ is defined as both not contain distance units. the prediction field and as active. Explanation: If a model does not contain User Response: The Intelligent Miner considers distance units, distance units are calculated by the field to be the prediction field. The other default. These default values might not specification is ignored. correspond to the distance units used to create the result. 3207 The field ″%1″ is defined as both User Response: If necessary, you can change the prediction field and as these default values in the attribute supplementary. similarityScale of all elements ClusteringField in Explanation: The Intelligent Miner considers the the XML model. field to be the prediction field. The other specification is ignored. 186 Administration and Programming for DB2
  • 202. 3226 An error occurred when the 4405 Cannot load the input results Intelligent Miner read the result specified (or none are specified). object. 4407 The current data source does not 3227 The field ″%1″ is defined as active match data used for training. in the settings object, but not in the specified result object. 4440 The class field is continuous. User Response: Check if you specified the Explanation: Neural Classification requires a correct result object or if the result object was discrete data type. created with an older version of Intelligent Miner. If the latter is the case, create a new result object in training mode. 4441 The predicted field is not numeric. 3228 The number of active fields Explanation: Neural prediction requires a selected for this mining run must numeric data type. equal the number of active fields in the result object. 4442 The categorical field ″%1″ has ″%2″ different values. 3232 The field ″%1″ was not specified when the function was run in Explanation: Without automatic normalization training mode. at most two different values are allowed. For this reason, the field is User Response: Use automatic normalization ignored in test or application for your input data, or clean the input data mode. source to remove extraneous values. 3233 The Intelligent Miner cannot find 4470 The model cannot be loaded. A a value for the expected region required tag ″%1″ is missing in among the input regions. the model. The model is The result object might be incomplete or damaged. Try to damaged. recreate the model. Explanation: An XML tag that is specified as 3260 Quantiles cannot be computed mandatory in the relevant PMML is missing. because there are no quantiles in Without this tag, a complete model cannot be the result object. constructed. User Response: Recreate the model. If the 3270 There is no data to mine. problem persists, ask the provider of the PMML Explanation: The input data object might refer model for a valid version. to an empty file or database table. Alternatively, a filter record condition was specified that 4471 An internal error has occurred. excludes all records in the input data. Contact your IBM representative. User Response: Check the input data and any Explanation: An internal program error has filter records conditions. occurred. User Response: Try again. If the problem Appendix E. Error messages 187
  • 203. persists and you can reproduce the error, contact 4475 The model is inconsistent. The your IBM representative. number of centers in the clusters does not match the previous 4472 There is not enough free memory information. The model is available to complete the incomplete or damaged. Try to requested operation. Close some recreate the model. applications, and try again. Explanation: The number of clusters in the User Response: Free some memory by closing PMML model differs in the relevant sections. The any other running applications. If the problem model is inconsistent. Therefore, you cannot persists, try to extend your virtual memory or apply this model. swap partition size, or install more RAM. User Response: Try to recreate the model. If the problem persists, ask the provider of the PMML 4473 The model cannot be loaded. The model to correct it. measure specified in the PMML is not supported in this release. The 4476 The model cannot be loaded. measures supported are Euclidean PMML version ″%1″ is not and squared Euclidean. Change supported in this release. your model accordingly. Explanation: The PMML version specified in Explanation: The PMML model specifies a the model is not supported in this release. measure for the Kohonen Network that is not supported in this release. User Response: Either check for a new version of this product that supports the relevant PMML User Response: All measures specified in the version, or try to export the PMML model to a relevant PMML core are supported. Therefore, version supported by this product. you have a PMML model that uses a measure not included in the core. Try to recreate the model, and specify a measure included in the 4477 The model cannot be loaded. The PMML core. activation function ″%1″ is not supported in this release. 4474 The model cannot be loaded. The Explanation: The PMML model specifies an compare function ″%1″ is not activation function for the Neural Network that supported in this release. is not supported in this release. Explanation: The PMML model specifies a User Response: All activation functions compare function for the Kohonen Network that specified in the relevant PMML core are is not supported in this release. supported. Therefore, you have a PMML model that uses an activation function not included in User Response: All compare functions specified the core. Try to recreate the model, and specify in the relevant PMML core are supported. an activation function included in the PMML Therefore, you have a PMML model that uses a core. compare function not included in the core. Try to recreate the model, and specify a compare function included in the PMML core. 188 Administration and Programming for DB2
  • 204. 4479 The model cannot be loaded. The 4482 The model cannot be loaded. This neuron ″%1″ could not be is not a PMML model. The model connected with the neuron ″%2″. is incomplete or damaged. Try to No neuron with the ID ″%2″ was recreate the model. found in the network. The model Explanation: An attempt was made to score is incomplete or damaged. Try to data with something that is not a PMML model. recreate the model. User Response: Ensure that you specify a valid Explanation: The neural network includes a PMML model. neuron that is connected to another neuron whose ID does not exist. The PMML model is invalid. 4483 The record cannot be scored. An invalid record was received. User Response: Recreate the model. If the problem persists, ask the provider of the PMML Explanation: The record passed to the model model for a valid version. refers to variable names that are completely different from those that the model expects. 4480 The model cannot be loaded. The User Response: Make sure that the column connections of the output layer names in the data match the variable names in are inconsistent. The model is the model. incomplete or damaged. Try to recreate the model. 4484 The requested result is not Explanation: The number of outputs does not available. An attempt was made match the number of neurons in the output layer. to retrieve a result for a The model is invalid. classification model; however, this is a value-prediction model. User Response: Recreate the model. If the problem persists, ask the provider of the PMML Explanation: An attempt was made to retrieve a model for a valid version. Classification result from a model that does value prediction. Most probably, the wrong model was chosen. 4481 The model cannot be loaded. The field names in the ’CenterFields’ User Response: Make sure that you choose a tag do not match the ones in the Classification model. ’ClusteringField’ tags. The model is incomplete or damaged. Try to recreate the model. 4485 The requested result is not available. An attempt was made Explanation: The names of the clusters are not to retrieve a result for a consistent throughout the PMML model. The value-prediction model; however, model is invalid. this is a classification model. User Response: Recreate the model. If the Explanation: An attempt was made to retrieve a problem persists, ask the provider of the PMML value prediction result from a model that does model for a valid version. classification. Most probably, the wrong model was chosen. User Response: Make sure that you choose a value prediction model. Appendix E. Error messages 189
  • 205. 4486 The result is invalid. The result 4492 The model is inconsistent. The value could not be denormalized value ″%1″ could not be set as a because it is an outlier. Adjust the missing value replacement. normalization parameters in the Explanation: A replacement for missing values output layer of the model. was specified in the PMML, but is invalid in the Explanation: The output value of the neural context of the mining field. network cannot be denormalized because it is User Response: Verify that you specify a valid out of the denormalization range. You cannot PMML model. score this record. 6001 The command line option that 4487 The result is invalid. It was not selects the method for building possible to map the result to a the input data records was found, string for a classification result. but no method was specified. Explanation: The result of the classification is Explanation: The command line option that invalid. You cannot score this record. selects the desired method for building input data records in the SQL script was not given 4488 The model is invalid. You cannot correctly. The M switch must be followed by a use the model with this release. value that identifies one of the methods that Convert the original model again, allow input data records to be built. using the conversion utilities of User Response: Correct the command line this release. option that was specified incorrectly. If necessary, Explanation: The PMML model cannot be review your documentation of command line applied with this release. options. User Response: Convert the original model again, using the conversion utilities of this 6002 The command line option that release. selects the method for building the input data records was found twice. 4489 The data cannot be scored. The model is not a value prediction Explanation: The same command line option model. cannot be used twice. Explanation: The model you specified is not a value prediction model. Most probably you have 6003 The command line option that a Classification model. selects the method for building the input data records was found, User Response: Specify a value prediction but the specified method name is model. invalid. Explanation: The command line option that 4490 The data cannot be scored. The selects the desired method for building input model is not a classification data records in the SQL script was not given model. correctly. The value identifying one of the Explanation: The model you specified is not a permitted methods is illegal. Classification model. Most probably you have a User Response: Use one of the method names value prediction model. that are allowed. If necessary, review your User Response: Specify a Classification model. documentation of command line options. 190 Administration and Programming for DB2
  • 206. 6004 The command line option for the 6008 The command line option for SQL dialect was found, but no encoding the PMML file was SQL dialect identifier was found twice. specified. Explanation: The same command line option Explanation: The command line option for cannot be used twice. selecting the SQL dialect was not given correctly. The D switch must be followed by a value 6011 The method DM_ApplColumn is identifying one of the SQL dialects (DB2 or not supported for DB2 SQL. Oracle) that are allowed. Explanation: The combination of the method User Response: Correct the command line DM_ApplColumn and the SQL dialect DB2 is not option that has been incorrectly specified. If allowed. DM_ApplColumn is specific to Oracle. necessary, review your documentation of command line options. User Response: Use a different method to build input data records for DB2 SQL scripts. 6005 The command line option for the SQL dialect was found twice. 6012 The output file has the same name as the input file. Explanation: The same command line option cannot be used twice. Explanation: The output file cannot have the same name as the input file. 6006 The command line option for the User Response: Use a different name for the SQL dialect was found, but the output file. specified SQL dialect is invalid. Explanation: The command line option for 6013 No input file is specified. selecting the SQL dialect was not specified correctly. The value identifying one of the Explanation: An input PMML file is a permitted SQL dialects is illegal. mandatory command line parameter. User Response: Use one of the SQL dialects that User Response: Specify the PMML input file on are allowed. If necessary, review your the command line. documentation of command line options. 6014 The output file cannot be opened 6007 The command line option for for write. encoding the PMML file was Explanation: The output file cannot be written. found, but no encoding identifier The file name or path name might not be valid, was specified. or the permission rights may not allow the file to Explanation: The command line option for be created, to be written, or both. encoding the codepage of the PMML file was not User Response: Ensure that the output file can given correctly. The E switch must be followed be created, and that it can be opened for write. by a valid encoding string. User Response: Correct the command line 6015 More than one output file was option that was specified incorrectly. If necessary, found in the command line. review your documentation of command line options. Explanation: Only one output file is allowed. If no output file is given, the result is written to standard output. Appendix E. Error messages 191
  • 207. 6016 The PMML file ″%1″ cannot be 6021 The model type is illegal. processed. Explanation: The model type in the PMML file Explanation: See the preceding errors in the is not allowed. This may happen if a model type error file for a more detailed error report. (for example, an association rules model) is used that cannot be processed by IM Scoring because there is no application mode for this model type. 6017 No model name was found in the PMML file. User Response: Use a different model type. Explanation: Either the PMML model in the input file does not contain a model name or the 7219 Insufficient memory. file cannot be parsed. The SQL script cannot be generated without a model name. 7268 The result file specified is not a User Response: Either use a PMML file that valid Intelligent Miner Version 6 contains a model name or insert the XML result file. element specifying the model name into the PMML file. Ensure that the file contains correct 7500 The following item is missing in PMML. See if the error file contains more the model: ″%1″. information about possible parsing errors. 7501 The output file cannot be opened. 6018 The model type cannot be determined. 7502 Error while writing to output file. Explanation: The PMML file cannot be parsed correctly. 7503 Internal error: array overflow. User Response: Ensure that the file contains correct PMML. See if the error file contains more information about possible parsing errors. 7504 Internal error: Cannot set records. 6019 The data fields for the model 7505 Internal error. cannot be determined. Explanation: The PMML file cannot be parsed 7506 The file specified does not correctly. contain a valid regression model. User Response: Ensure that the file contains correct PMML. See if the error file contains more 7507 The file specified is not a valid information about possible parsing errors. XML file. 6020 The output SQL script cannot be 7508 The PMML version specified is generated. unsupported. The supported versions are 1.1 and 2.0. Explanation: Errors occur when the SQL script is being generated. 7509 The value of the PMML tag User Response: See if the error file contains a modelType is invalid. Your model previous error with more detailed information is probably damaged. about possible errors. 192 Administration and Programming for DB2
  • 208. 7510 The value for tag ″%1: %2″ is 8119 Your Association model contains unsupported. Make sure that the ambiguous item names. One item model conforms to the PMML name describes more than one version specified. item. Therefore, the assignment of support values to the rule head/body might be ambiguous 7511 The model is not a value during conversion. To minimize prediction model but a the impact of this, the higher classification model. Make sure support value is chosen. This that you apply this function only effect occurs only when an to linear or polynomial regression ambiguous name mapping was models. applied during the generation of the Association model. To avoid 7512 The model is not a classification this warning, use a one-to-one model but a value prediction name mapping during the model. Make sure that you apply generation of the Association this function only to logistic model. regression models. 8120 The converted Association model 8116 The file ″%1″ holding the holds fewer rules than the Associations model could not be original model, because some read. required information is missing in the original model. This effect Explanation: The pathname might be incorrect, occurs only if item constraints of or the file permissions might be insufficient for the type ’including’ were applied the file to be read. during the generation of the User Response: Verify the pathname and the original model using Intelligent permissions for the file. Miner for Data Version 6 or earlier. To avoid this warning, do not use item constraints when 8117 The Associations model cannot be generating or regenerating your converted from non-PMML format model with a newer version of to PMML format because the Intelligent Miner for Data. internal structure of the non-PMML model is invalid. 8402 Error while opening file ″%1″ Explanation: The model might be corrupt. (″%2″). User Response: Generate the model again. 8405 The Intelligent Miner detected an 8118 The file ″%1″ used to store the unknown attribute ″%1″ because converted Associations model the result object does not match could not be opened for writing. the specifications in settings object. Explanation: The pathname might be invalid, or the file permissions might be insufficient for the User Response: Specify a suitable result object file to be written. for the settings object (″%2″). User Response: Verify the pathname and the permissions for the file. 8641 Loading of pruned tree failed (line ″%1″). Appendix E. Error messages 193
  • 209. 8642 Dummy node without a parent 8656 Invalid node detected: node must detected. not have one single child. 8643 Dummy node whose parent does 8657 Attribute ″%1″ must not be ″%2″. not have a left child detected. 8658 ″%1″ is not an attribute of 8644 Boolean operator ″%1″ detected element ″%2″. not supported with ″%2″ tag. 8659 No class label specified. 8645 Node without ″%1″ detected. 8660 Error reading tree classification 8646 Node with continuous test feature model. (″%1″) with more than one ″%2″ detected. 8661 The Predicate field ″%1″ is not specified as a MiningField. 8647 Node with continuous test feature (″%1″) with unsupported operator 8662 The Predicate field ″%1″ of ″%2″ detected. unsupported type is detected. 8648 Node with continuous test feature 8663 The Predicate field ″%1″ is (″%1″): number expected, but ″%2″ continuous, but the value (″%2″) found. is nonnumeric. 8649 Node with different categorical 8664 The MiningField name=″%1″ does test features (″%1″, ″%2″) detected. not occur in the DataDictionary. 8650 Node with categorical test feature 8665 The Predicate name=″%1″ is not a (″%1″) with unsupported operator MiningField. ″%2″ detected. 8666 The Intelligent Miner Scoring 8651 Model used and record to be does not support regression tree classified do not match. scoring. 8652 More than one class label 8800 UDF is declared as fenced. It must specified. be declared not fenced. Explanation: An SQL function needs to work 8653 Node is not an element node. with locators, but cannot do so because it is declared as fenced. 8654 Tree model node without score is User Response: You must drop the function not feasible. definition in your DB2 instance and recreate it using the CREATE FUNCTION command, this 8655 Node does not have distribution time declaring the function as not fenced. specified. 194 Administration and Programming for DB2
  • 210. 8801 Internal error: sqludf_length 8806 Internal error. Locator is already received a bad input value. freed, or free is not allowed. Explanation: DB2 or IM Scoring might not be Explanation: DB2 or IM Scoring might not be installed correctly. installed correctly. User Response: Look for any hints in the DB2 User Response: Look for any hints in the DB2 dump file db2dump.log, or contact your IBM dump file db2dump.log, or contact your IBM representative. representative. 8802 Internal error: sqludf_substr 8807 The importing of Intelligent received a bad input value. Miner V6 results is not supported in unfenced mode. Use idmxmod Explanation: DB2 or IM Scoring might not be to convert the model into the installed correctly. PMML format, and then run the User Response: Look for any hints in the DB2 import routine again. dump file db2dump.log, or contact your IBM Explanation: Models in Intelligent Miner format representative. must be converted to PMML format before they can be used by IM Scoring. This conversion 8803 Internal error: sqludf_append cannot be done in unfenced mode. received a bad input value. User Response: Use idmxmod to convert the Explanation: DB2 or IM Scoring might not be model into the PMML format, and then run the installed correctly. import routine again. User Response: Look for any hints in the DB2 dump file db2dump.log, or contact your IBM 8901 The model is not a clustering representative. model. Explanation: The model that was specified as 8804 Internal error: sqludf_create input for a DM_applyClusModel or DM_impClusFile received a bad input value. function is not a Clustering model. Explanation: DB2 or IM Scoring might not be installed correctly. 8902 The model is not a classification model. User Response: Look for any hints in the DB2 dump file db2dump.log, or contact your IBM Explanation: The model that was specified as representative. input for a DM_applyClasModel or DM_impClasFile function is not a Classification model. 8805 Internal error: sqludf_free received a bad input value. 8903 The model is not a regression model. Explanation: DB2 or IM Scoring might not be installed correctly. Explanation: The model that was specified as input for a DM_applyRegModel or DM_impRegFile User Response: Look for any hints in the DB2 function is not a Regression model. dump file db2dump.log, or contact your IBM representative. 8904 The model does not correspond to the XML format. Appendix E. Error messages 195
  • 211. using DM_impClasFile, DM_impClusFile, 8905 Internal error. The type of model DM_impRegFile, DM_impClasFileE, is unknown. DM_impClusFileE, DM_impRegFileE, Explanation: The model is not recognized as a DM_impClasModel, DM_impClusModel, or Clustering, Classification, or Regression model. DM_impRegModel. This is an internal error. 8916 The model is not unique. 8906 File ″%1″ cannot be opened for Explanation: A model that is passed to read. DM_applyClasModel, DM_applyClusModel, or Explanation: The file you specified as input for DM_applyRegModel must have a constant value. It an import function cannot be read. Check file is not possible to apply data to more than one name, path, and permissions. model at the same time. User Response: Verify your SQL command and 8907 File ″%1″ cannot be opened for make sure that only one model is passed. write. Explanation: A temporary file cannot be opened 8917 Encoding is only allowed for XML for write. files. User Response: Check disk space and Explanation: You specified an encoding when permissions in /tmp (AIX), TEMP directory importing a file in V6 format. Encoding is only (Windows NT). allowed for XML files. The encoding is ignored. User Response: To import V6 models, use the 8908 The result format in file ″%1″ is function DM_impClasFile, DM_impClusFile, or wrong. DM_impRegFile. Explanation: The model that was specified as input for an import function is not in either IM 8918 The encoding of the XML model for Data format or PMML format. is missing. Explanation: The XML model you want to 8909 The tree classification test model import does not contain any XML declaration. cannot be applied. Use a tree Therefore the encoding of the model cannot be classification training model determined. instead. User Response: You can add an XML Explanation: Tree Classification test models are declaration at the beginning of the model, or you used to verify the quality of a training run. You can specify an encoding when you import the cannot use them for scoring. model. User Response: Use the Tree Classification training model instead. 8919 The model (″%1″ bytes) cannot be stored in a LOB (″%2″ bytes). 8914 The header of the model is not Explanation: After internal conversions, the valid. imported model is too big to be stored in a Large Object (LOB). Explanation: The model was not imported into DB2 with the appropriate function. Therefore it User Response: You must increase the does not contain a valid header. maximum value for a LOB. User Response: Import the model again by 196 Administration and Programming for DB2
  • 212. 8920 Insufficient memory available to 8962 The database ″%1″ was not convert and store the model. disabled. Check the preceding error messages. Correct the User Response: Reconnect to the database and problems, and rerun the try again. If the problem persists, restart the DB2 command. instance. Explanation: IM creates database objects like TYPES, PROCEDURES, METHODS, and TABLES 8930 Evaluation period over. when enabling a database. These objects are dropped from the database when the database is 8956 Check the preceding warning disabled. If you created your own database messages. objects (for example, tables or triggers) that use the database objects created by IM, the IM database objects cannot be dropped. 8957 Warning: The type ″%1″ already exists with a different size (″%2″ The most common error is to enable the database bytes) from that requested (″%3″ with sample tables (idmenabledb <dbname> bytes). To specify a new size, first tables), and then to forget the ″tables″ parameter disable the database. when the database is being disabled. For example: Explanation: The size of UDTs cannot be changed if dependent objects that use these types E:im810btest>idmdisabledb testudf (for example, tables) already exist. To change the ........................................ size of UDTs, drop the types by using DROP TYPE IDMMX.DM_MiningData idmenabledb tables, then re-enable the database using idmenabledb. E 2303: An SQL error occurred: SQLstate: "42893", SQL Error Message: [IBM] User Response: To specify a new size, first [CLI Driver][DB2/NT] SQL0478N disable the database. The object type "TYPE" cannot be dropped because there is an object 8958 Warning: The function ″%1″ "IDMMX.MININGDATA", of type "TABLE", that depends on it. (special name ″%2″) is used in SQLSTATE=42893 other database objects, and cannot See the DB2 User’s Guide or the Message be updated. Reference. Explanation: If a user-defined function (UDF) or E 8962: The database "testudf" was not method (UDM) is used in other dependent disabled. Check the preceding error database objects like triggers or views, the UDF messages. Correct the problems, and or UDM cannot be changed. rerun the command. User Response: Drop the dependent object, and then rerun the command. E:im810btest>idmdisabledb testudf tables ...................................... 8961 The file ″%1″ cannot be found. Verify your installation, and check The database "testudf" was successfully disabled. your PATH settings. User Response: If you enabled the database using the ″tables″ parameter, use idmenabledb <dbname> tables. Otherwise, manually drop the objects that depend on IM database objects. Appendix E. Error messages 197
  • 213. 8963 The database ″%1″ was not enabled. Check the preceding error messages. Correct the problem, and rerun the command. Explanation: IM creates database objects like TYPES, PROCEDURES, METHODS, and TABLES when enabling a database. Some of these objects could not be created. The most common reason for this is that database objects with the same name might already exist in the schema IDMMX. Another possible reason is that the database might be already enabled for a different release of IM. User Response: If the database is already enabled, disable the database first. Otherwise, manually drop the database objects that have a name conflict with the database objects to be created by IM. 8998 Error and trace cannot be initialized. Explanation: The error file or the trace file could not be opened for writing, or the message catalog library idmmsg could not be read. User Response: Check the path of your trace file, and verify that you have permissions to write to the trace and error files. Windows: Check that the message catalog file <install path>IMinerXbinidmmsg.dll exists, and that the directory <install path>IMinerXbin is included in the system environment variable PATH. UNIX: Check that the message catalog file <install path>IMinerXliblibidmmsg.* exists. 198 Administration and Programming for DB2
  • 214. Appendix F. The DB2 REC2XML function Fixpack 3 of DB2 V7 introduces a new scalar function, REC2XML. This function can be used for easy and fast construction of application data values in XML. This section shows the syntax and gives a few examples of how this function can be used. Syntax REC2XML ( integer constant , format string , row tag string , column name ) The schema is SYSIBM. The REC2XML function returns a string formatted with XML tags and containing column names and column data. integer constant The expansion factor for replacing column data characters. It must be an integer value from 1 to 6. The integer constant value is used to calculate the result length of the function. For every column with a character or graphic data type, the length attribute of the column is multiplied by this expansion factor before it is added to the result length. To specify no expansion, use a value of 1. If the actual length of the result string is greater than the calculated result length of the function, this raises an error (SQLSTATE 22001). format string A string constant that specifies which format the function is to use during execution. The only value that is supported is ’COLATTVAL’. The format string is case-sensitive, and only uppercase values will be recognized. © Copyright IBM Corp. 2001, 2002 199
  • 215. This format returns a string with column as an attribute: >>---------------<row tag string>------------------------------------> .--------------------------------------------------------------. V | >-----<column name="column name"-+->column value--</column>--+---+---> | | ’----null="true"--/>--------’ >---------------</row tag string>----------------------------------->< row tag string A string constant that specifies the tag used for each row. If an empty string is specified, the value row is assumed. When using REC2XML in IM Scoring you should always use the empty string. column name A qualified or unqualified name of a table column. The column must have one of the following data types: v Numeric (SMALLINT, INTEGER, BIGINT, DECIMAL, NUMERIC, REAL, DOUBLE) v Character string (CHAR, VARCHAR, LONG VARCHAR, CLOB) v Graphic string (GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, DBCLOB) v Datetime (DATE, TIME, TIMESTAMP) v A user-defined type based on one of the above types. Character strings with a subtype of BIT DATA are not allowed. The same column name cannot be specified more than once, or an error will result (SQLSTATE 42734). Depending on the value specified for the format string, certain characters in column names and column values are replaced to ensure that the column names are valid XML values. These characters and their replacement values are as follows: < is replaced by &lt; > is replaced by &gt; " is replaced by &quot; & is replaced by &amp; ’ is replaced by &apos; 200 Administration and Programming for DB2
  • 216. A single character can be replaced by up to six characters in XML. That is the reason why REC2XML uses a parameter for the expansion factor. The following examples show how the REC2XML function is used and the strings that this function returns: v Using the DEPARTMENT table in the sample database, format the department table row, except the DEPTNAME and LOCATION columns, for department ’D01’ into an XML string. Because the data does not contain any of the characters that require replacement, the expansion factor will be 1.0 (no expansion). Also note that the MGRNO value is NULL for this row. SELECT REC2XML (1, ’COLATTVAL’,’’,DEPTNO,MGRNO,ADMRDEPT) FROM DEPARTMENT WHERE DEPTNO = ’D01’ This example returns the following string: <row><column name="DEPTNO">D01</column><column name="MGRNO" null= "true"/><column name="ADMRDEPT">A00</column></row> v The example: SELECT REC2XML (2,’COLATTVAL’,’’,CLASS_CODE,DAY,STARTING) FROM CL_SCHED WHERE CLASS_CODE = ’&43<FIE’ returns the following string: <row><column name=CLASS_CODE">&amp;43&lt;FIE</column><column name="DAY">5</column><column name="STARTING">06:45:00</column></row> v This example shows characters that are replaced in a column name: SELECT REC2XML (2,’COLATTVAL’,’’,Class,"time<noon") FROM (SELECT Class_code, Starting FROM Cl_sched WHERE Starting < ’12:00:00’) AS Early (Class, "time<noon") It returns: <row><column name="CLASS">&amp;43&lt;FIE</column><column name="time&lt;noon">06:45:00</column></row> Appendix F. The DB2 REC2XML function 201
  • 217. 202 Administration and Programming for DB2
  • 218. Appendix G. IM Scoring conformance to PMML This appendix outlines how IM Scoring conforms to the PMML standard. In this appendix, the following naming conventions of the PMML 2.0 standard are used: v Demographic Clustering in IM Scoring is called distribution-based clustering in PMML 2.0 v Neural Clustering in IM Scoring is called center-based clustering in PMML 2.0 v Neural Classification and Neural Prediction in IM Scoring are covered by the term neural networks in PMML 2.0 v Linear Regression, Logistic Regression, and Polynomial Regression are covered by the term regression in PMML 2.0 v Tree Classification in IM Scoring is called decision trees in PMML 2.0 IM Scoring application IM Scoring provides an SQL interface enabling the application of PMML models to data. For this feature, IM Scoring complies with the PMML consumer conformance clause of the PMML 2.0 standard for several algorithms. These algorithms are listed below with possible restrictions for their consumer conformance to PMML 2.0. Center-based clustering v IM Scoring supports all the core features of PMML 2.0 for center-based clustering. v The handling of missing values for unary or binary categorical fields of models created with Intelligent Miner products is different from the handling of missing values as defined in PMML 2.0 and used in IM Scoring for models from other producers. Therefore, IM Scoring does not deliver the same results that other vendors might deliver with these Intelligent Miner models when the data contains missing values. Decision trees v IM Scoring supports all the core features of PMML 2.0 for decision trees except the <SimpleSetPredicate> elements. v The handling of missing values for models created with Intelligent Miner products is different from, and more powerful than, the handling © Copyright IBM Corp. 2001, 2002 203
  • 219. of missing values as defined in PMML 2.0 and used in IM Scoring for models from other producers. Therefore, IM Scoring does not deliver the same results that other vendors might deliver with these Intelligent Miner models when the data contains missing values. Distribution-based clustering v IM Scoring supports all the core features of PMML 2.0 for distribution-based clustering. v IM Scoring additionally supports value weighting, which might be used in some models produced with Intelligent Miner products. (For more information about value weighting, see the documentation for the product in question.) For this reason, IM Scoring does not deliver the same results that other vendors might deliver with these non-conforming PMML 2.0 models. Neural networks v IM Scoring supports all the core features of PMML 2.0 for neural networks. v The handling of missing values for unary or binary categorical fields of models created with Intelligent Miner products is different from the handling of missing values defined in PMML 2.0 and used in IM Scoring for models from other producers. Therefore, IM Scoring does not deliver the same results that other vendors might deliver with these Intelligent Miner models when the data contains missing values. Regression IM Scoring supports all the core features of PMML 2.0 for regression. IM Scoring conversion tools IM Scoring delivers conversion tools that enable models in IM for Data format to be converted to PMML 2.0. IM Scoring conversion tools comply with the PMML producer conformance clause of the PMML 2.0 standard for center-based clustering, decision trees, neural networks, and regression. IM Scoring conversion tools also comply with the PMML producer conformance clause of the PMML 2.0 standard for distribution-based clustering when value weighting was not used to create the model. (For information, see the documentation for IM for Data.) However, if value weighting was used, IM Scoring conversion tools produce non-conforming PMML 2.0 models with a specific extension for value weighting. This extension can be read only by IM Scoring, which will use value weighting when scoring data on these models. Other PMML consumer tools, however, 204 Administration and Programming for DB2
  • 220. will ignore the extension, though they will read the model successfully. They will not consider value weighting when scoring data on these models. IM Scoring conversion tools additionally comply with the PMML producer conformance clause of the PMML 2.0 standard for association rules. However, taxonomies and name mappings that might have been used to create the models are not written into PMML 2.0. Radial-Basis Function prediction IM Scoring supports RBF prediction in addition to all of the other algorithms that have been listed. However, RBF prediction is not yet part of PMML, and therefore cannot comply with the producer or consumer conformance clause. The RBF prediction models written by IM Scoring conversion tools and used in the IM Scoring application have an XML proprietary format that is very similar to the other PMML formats. Appendix G. IM Scoring conformance to PMML 205
  • 221. 206 Administration and Programming for DB2
  • 222. Appendix H. Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: IBM World Trade Asia Corporation Licensing 2-31 Roppongi 3-chome, Minato-ku Tokyo 106, Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will © Copyright IBM Corp. 2001, 2002 207
  • 223. be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Deutschland Informationssysteme GmbH Department 3982 Pascalstrasse 100 70569 Stuttgart Germany Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this information and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. All IBM prices shown are IBM’s suggested retail prices, are current and are subject to change without notice. Dealer prices may vary. This information is for planning purposes only. The information herein is subject to change before the products described become available. 208 Administration and Programming for DB2
  • 224. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: © (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. © Copyright IBM Corp. _enter the year or years_. All rights reserved. If you are viewing this information softcopy, the photographs and color illustrations may not appear. Trademarks The following terms are trademarks of the IBM Corporation in the United States, other countries, or both: AIX DB2 DB2 Universal Database IBM Intelligent Miner SP SP2 UNIX is a registered trademark of The Open Group in the United States and other countries. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Appendix H. Notices 209
  • 225. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others. 210 Administration and Programming for DB2
  • 226. Bibliography and related information This bibliography lists all publications in the Intelligent Miner library other than this book, related IBM publications that are relevant to the contents of this book, and non-IBM publications that might be useful for reference purposes. Where appropriate, IBM publication numbers are given after the document title. This will assist you in finding the document online at the IBM Publications Center, which is available on the Web at: http://www.elink.ibmlink.ibm.com/public/applications/publications/ cgibin/pbi.cgi IBM DB2 Intelligent Miner publications v Discovering Data Mining, SG24-4839 v Intelligent Miner for Data: Application Programming Interface and Utility Reference, SH12-6751 v Intelligent Miner for Data Applications Guide, SG24-5252 v Intelligent Miner for Data: Enhance Your Business Intelligence, SG24-5422 v Intelligent Miner for Data: Using the Intelligent Miner for Data, SH12-6750 v Intelligent Miner Modeling: Administration and Programming, SH12-6736 v Intelligent Miner Visualization: Using the Intelligent Miner Visualizers, SH12-6737 v Mining Relational and Nonrelational Data with IBM Intelligent Miner for Data Using Oracle, SPSS, and SAS As Sample Data Sources, SG24-5278 v Mining Your Own Business in Banking: Using DB2 Intelligent Miner for Data, SG24-6272 v Mining Your Own Business in Health Care: Using DB2 Intelligent Miner for Data, SG24-6274 v Mining Your Own Business in Retail: Using DB2 Intelligent Miner for Data, SG24-6271 v Mining Your Own Business in Telecoms: Using DB2 Intelligent Miner for Data, SG24-6273 © Copyright IBM Corp. 2001, 2002 211
  • 227. IBM DB2 Universal Database (DB2 UDB) publications v DB2 UDB Administration Guide: Implementation, SC09-2944 v DB2 UDB Administration Guide: Performance, SC09-2945 v DB2 UDB Administration Guide: Planning, SC09-2946 v DB2 UDB Application Development Guide, SC09-2949 v DB2 UDB Call Level Interface Guide and Reference, SC09-2843 v DB2 UDB Command Reference, SC09-2951 v DB2 UDB Enterprise-Extended Edition for UNIX: Quick Beginnings, GC09-2964 v DB2 UDB Enterprise-Extended Edition for Windows: Quick Beginnings, GC09-2963 v DB2 UDB for OS/2®: Quick Beginnings, GC09-2968 v DB2 UDB for UNIX: Quick Beginnings, GC09-2970 v DB2 UDB for Windows: Quick Beginnings, GC09-2971 v DB2 UDB Message Reference (Volumes 1 and 2), GC09-2978 and GC09-2979 v DB2 UDB SQL Getting Started, SC09-2973 v DB2 UDB SQL Reference, Volumes 1 and 2, SBOF-8933 Related information v The IBM Software Support Handbook is available on the Web at http://techsupport.services.ibm.com/guides/handbook.html v Information relating to PMML is available on the Web at http://www.dmg.org 212 Administration and Programming for DB2
  • 228. Index conversion facility 11 A C conversion utilities AIX systems Classification 17 client 155 exporting PMML or XML models client registration tool facility 11 server 154 on 158 CLOB values, importing mining converting exported models 44 installing IM Scoring Java Beans models from 47 CRM xii on 161 clusDemoBanking.dat Customer Relationship installing IM Scoring on 145 description of sample mining Management xii prerequisites for installing IM model 24 Scoring Java Beans on 161 prerequisites for installing IM cluster IDs practice exercise in D data Scoring on 145 computing 29, 30 specifying registering the model conversion Clustering 17 by means of CONCAT 53 facility on 159 code samples specifying by means of uninstalling IM Scoring Java for applying models 55 DM_applData 52 Beans on 162 for scoring with IM Scoring Java specifying by means of uninstalling IM Scoring on 148 Beans 63 REC2XML 51 application functions 53, 54 commands data file, sample purpose of 50 idmcheckdb 14, 43, 132 bankingScoring.data 24 application mode 7 idmdisabledb 14, 42, 132, 133 data importing, in practice application results, getting 56 idmenabledb 14, 41 exercises 25 idminstfunc 135 data mining functions 17 B idmlevel 68, 135 data mining markup language, bankingApplyModeling1.db2 script idmlicm 14, 135 PMML 11 contents 32 idmmkSQL 13, 136 data records, specifying for IM bankingApplyModeling2.db2 script practice exercise in using 38 Scoring Java Beans 62 contents 32 idmuninstfunc 138 data types bankingApplyTable1.db2 script idmxmod 139 DM_ApplicationData contents 28 shared between IM Modeling and functions for working description 24 IM Scoring 172 with 78 use of 28 components, sample 23 DM_LogicalDataSpec bankingApplyTable2.db2 script compression, exporting and methods for working contents 30 importing models with the use with 77 description 24 of 168 overview 75, 76, 77 bankingApplyView.db2 script computed results purpose 9 contents 27 accessing for IM Scoring Java shared between IM Modeling and description 24 Beans 62 IM Scoring 171 for applying a model 27 CONCAT specifying 9 bankingExtract.db2 script using to specify data 53 user-defined 10 contents 31 configuration, verifying 22 database objects bankingImport.db2 script configuring creating 22, 41 contents 26 database environments 21 overview 75 description 24 DBMS databases bankingInsert.db2 script on Windows systems 158 checking 43 contents 26 IM Scoring, quick-start guide disabling 42 description 24 to 19 enabling 41 bankingScoring.data system environments DB2 description of flat file 24 on UNIX systems 157 getting diagnostic information bibliography 211 on Windows systems 158 about 71 conformance to standards 13 © Copyright IBM Corp. 2001, 2002 213
  • 229. DB2 instances DM_getClusConf (continued) DM_impDataSpec (continued) disabling on UNIX systems 157 description 104 description 88 enabling on UNIX systems 157 DM_getClusMdlName 79 DM_impRegFile 46, 80 enabling on Windows description 105 description 128 systems 158 DM_getClusMdlSpec 49, 79 DM_impRegFileE 47, 80 DB2 SQL states 173 description 106 description 129 DB2 Utilities DM_getClusScore 27, 28, 30, 56, 79 DM_impRegModel 48, 80 exporting and importing models description 107 description 130 by means of 168 DM_getClusterID 27, 28, 29, 30, 56, DM_isCompatible 77 DBMS 79, 170 description 89 configuring on Windows description 108 DM_LogicalDataSpec data type 76, systems 158 DM_getClusterName 79 77 diagnostic information, DB2 description 109 DM_Numerical mining field getting 71 DM_getConfidence 56, 78 type 86 disabling description 110 DM_RegressionModel data type 46, DB2 instances 157 DM_getFldName 49, 77 47, 48, 76, 80 DM_applData 27, 28, 29, 31, 55, 78 description 85 DM_RegResult data type 54, 77, 80 description 92 DM_getFldType 49, 77 specifying data by means of 52 description 86 E DM_ApplicationData data type 75, DM_getNumClusters 79 enabling 78 description 111 DB2 instances 157 DM_applyClasModel 50, 54, 55, 78 DM_getNumFields 49, 77 on Windows systems 158 description 94 description 87 IM for Data Version 6 to export DM_applyClusModel 27, 28, 29, 30, DM_getPredClass 56, 79 PMML or XML models 158 31, 50, 54, 79 description 112 environment variables description 95 DM_getPredValue 56, 80 IDM_MX_TRACEFILE 69 DM_applyRegModel 50, 54, 80 description 113 IDM_MX_TRACELEVEL 69 description 96 DM_getQuality 56, 79 setting for IM Scoring Java DM_Categorical mining field description 114 Beans 59 type 86 DM_getQuality(clusterid) 56, 79 environments, system DM_ClasModel data type 46, 47, description 115 configuring on UNIX 48, 75, 78 DM_getRBFRegionID 56, 80 systems 157 functions for working with 78 description 116 error information, getting 65 DM_ClasResult data type 54, 76, DM_getRegMdlName 80 error messages 173 78, 79 description 117 IM Scoring 174 DM_ClusResult data type 54, 76, 79 DM_getRegMdlSpec 49, 80 exception classes for RecordScorer DM_ClusteringModel data type 46, description 118 and base class Scorer 64 48, 76, 79, 80 DM_getRegTarget 80 exported models, converting 44 DM_expClasModel 78 description 119 exporting description 97 DM_impApplData 78 mining models DM_expClusModel 79 description 120 by means of DB2 description 98 DM_impClasFile 46, 78 Utilities 168 DM_expDataSpec 77 description 121 with the use of description 84 DM_impClasFileE 47, 78 compression 168 DM_expRegModel 80 description 122 mining models from IM for description 99 DM_impClasFileE data type 47 Data 43 DM_getClasCostRate 78 DM_impClasModel 48, 78 PMML models description 100 description 123 configuring IM for Data DM_getClasMdlName 78 DM_impClusFile 27, 46, 79 for 20 description 101 description 124 PMML or XML models DM_getClasMdlSpec 49, 78 DM_impClusFileE 47, 79 on AIX systems 158 description 102 description 125 on Sun Solaris systems 159 DM_getClasTarget 78 DM_impClusModel 48, 80 on Windows systems 160 description 103 description 127 DM_getClusConf 56, 79 DM_impDataSpec 77 214 Administration and Programming for DB2
  • 230. idmmkSQL command (continued) IM Scoring (continued) F practice exercise in using 38 prerequisites for installing features, installable IDMMX schema 10, 171 (continued) PMML conversion utilities IDMMX.ClusterModels on Linux systems 149 client 155 use of sample table 25 on Sun Solaris systems 150 server 154 idmrlnconv on Windows systems 153 scoring samples 154 removing links with script 160 quick-start guide to installing and user-defined functions for idmuninstfunc command 138 configuring 19, 20 DB2 154 idmxmod command 139 standards conformance 13 field names in models, querying 49 IM for Data uninstalling function syntax 10, 11 application mode 7 on AIX systems 148 function types client registration tool facility 11 on Linux systems 150 application functions 50 configuring to export PMML on Sun Solaris systems 152 import functions models 20 on Windows systems 156 purpose of 45 enabling to export PMML or using 41 results functions 56 XML models 158 in a multilanguage functions exporting models from 43 environment 65 application functions 53, 54 using to produce models 8 working with versions V7.1 and for working with mining model IM Modeling V8.1 in parallel 167 type DM_ClasModel 78 applying models created IM Scoring Java Beans for working with mining model with 31 accessing computed results 62 type DM_ClusteringModel 79, providing models by means accessing model metadata 80 of 9, 48 for 61 for working with mining model IM Scoring 7 applying scoring 62 type DM_RegressionModel 80 administrative tasks 65 code sample 63 for working with scoring data coexistence with IM installing type DM_ApplicationData 78 Modeling 171 on AIX systems 161 for working with scoring result commands shared with IM on Linux systems 162 type DM_ClasResult 78, 79 Modeling 172 on Sun Solaris systems 164 for working with scoring result conformance to PMML 203 on Windows systems 165 type DM_ClusResult 79 data types shared with IM online scoring with 12 for working with scoring result Modeling 171 practice exercises in using 33 type DM_RegResult 80 database objects overview 75 prerequisites for installing shared between IM Modeling and e-business enhancements 13 on AIX systems 161 IM Scoring 171 functional enhancements in 13 on Linux systems 163 G functions shared with IM Modeling 171 on Sun Solaris systems 164 on Windows systems 165 getting error information 65 infrastructure enhancements sample components for 25 getting product information 68 in 14 setting environment variables getting support 66 installable features, on Windows for 59 GUI xii systems 154 specifying data records 62 I installing specifying the mining model to ICU xii on AIX systems 145 be used with 60 IDM_MX_TRACEFILE environment on Linux systems 149 uninstalling variable 69 on Sun Solaris systems 150 on AIX systems 162 IDM_MX_TRACELEVEL on Windows systems 153 on Linux systems 163 environment variable 69 introduction to 7 on Sun Solaris systems 164 idmcheckdb command 14, 43, 132 limitations in 14 using 58 idmdisabledb command 14, 42, 132, methods shared with IM IMinerX.symblnk file set 133 Modeling 172 on AIX systems 158 idmenabledb command 14, 41 migration from IM Scoring import functions idminstfunc command 135 V7.1 167 purpose of 45 idmlevel command 68, 135 new features in version 8.1 12 importing idmlicm command 14, 135 prerequisites for installing data, practice exercises in 25 idmmkSQL command 13, 136 on AIX systems 145 mining models 26 Index 215
  • 231. importing (continued) mining models (continued) by means of DB2 M importing (continued) mandatory steps in installing and Utilities 168 from a file 45 configuring IM Scoring 19 from a file 45 from CLOB values 47 markup language for data mining, from CLOB values 47 in unfenced mode 169 PMML 11 in unfenced mode 169 PMML messages, error 173 using a specific XML configuring IM for Data to IM Scoring 174 encoding 47 export 20 method syntax 10 with the use of practice exercise in applying 27 methods compression 168 providing by means of IM for for working with data type installable features Data 8 DM_LogicalDataSpec 77 PMML conversion utilities providing by means of IM shared between IM Modeling and client 155 Modeling 9, 48 IM Scoring 172 server 154 querying field names 49 methods, user-defined 10 scoring samples 154 specifying the model to be used mining functions 17 installation and configuration, with IM Scoring Java Beans 60 Classification 17 verifying 22 working with 43 Clustering 17 installing IM Scoring missing values, handling 57 Regression/Prediction 18 on AIX systems 145 model conversion facility 11 supported by IM Scoring 7 on Linux systems 149 model metadata, accessing for IM mining models on Sun Solaris systems 150 Scoring Java Beans 61 applying 49 on Windows systems 153 quick-start guide to 19, 20 applying models and computing cluster IDs in one SQL N installing IM Scoring Java Beans Neural models 169 query 29, 30 on AIX systems 161 on Linux systems 162 applying models created with IM O Modeling 31 optional steps in installing and on Sun Solaris systems 164 clusDemoBanking.dat, configuring IM Scoring 20 on Windows systems 165 sample 24 overview Intelligent Miner for Data 5 code sample for applying 55 data types 75, 76, 77 Intelligent Miner Modeling 4 converting exported 44 Intelligent Miner product family 3 Intelligent Miner for Data 5 DM_ClasModel P functions for working PMML xii, 13 Intelligent Miner Modeling 4 with 78 configuring IM for Data to export Intelligent Miner Scoring 3 DM_ClusteringModel PMML models 20 Intelligent Miner Visualization 5 functions for working conversion utilities Intelligent Miner Scoring 3 with 79, 80 client 155 Intelligent Miner Visualization 5 DM_RegressionModel server 154 introducing the Intelligent Miner functions for working IM Scoring conformance to 203 product family 3 with 80 markup language for data L enabling IM for Data to export PMML or XML models 158 mining 11 limitations, IM Scoring 14 models Linux systems exporting and importing by exporting on AIX installing IM Scoring Java Beans means of DB2 Utilities 168 systems 158 on 162 exporting and importing with the exporting on Sun Solaris 159 installing IM Scoring on 149 use of compression 168 exporting on Windows prerequisites for installing IM exporting from IM for Data 43 systems 160 Scoring Java Beans on 163 exporting on AIX systems 158 practice exercises 25 prerequisites for installing IM exporting on Sun Solaris 159 Predictive Model Markup Scoring on 149 exporting on Windows Language xii uninstalling IM Scoring Java systems 160 prerequisites Beans on 163 generating SQL scripts from 23, for installing IM Scoring uninstalling IM Scoring on 150 44 on AIX systems 145 importing 26, 45 on Linux systems 149 by using a specific XML on Sun Solaris systems 150 encoding 47 on Windows systems 153 216 Administration and Programming for DB2
  • 232. prerequisites (continued) scoring features syntax for installing IM Scoring Java installable user-defined functions function 10, 11 Beans for DB2 154 method 10 on AIX systems 161 scoring functions system environments on Linux systems 163 purpose 9 configuring on UNIX on Sun Solaris systems 164 specifying 9 systems 157 on Windows systems 165 scoring methods configuring on Windows problem identification purpose 9 systems 158 worksheet 67 scripts product information, getting 68 bankingApplyModeling1.db2 32 T publications 211 bankingApplyModeling2.db2 32 table bankingApplyTable1.db2 28 IDMMX.ClusterModels 27 Q description 24 tables, creating in practice quick-start guide 19 bankingApplyTable2.db2 30 exercises 25 to installing IM Scoring 20 description 24 trace facility 69 bankingApplyView.db2 27 using on UNIX systems 70 R description 24 using on Windows systems 70 RBF xii bankingExtract.db2 31 README files 66 bankingImport.db2 26 U reason codes description 24 UDF xii IM Scoring 174 bankingInsert.db2 26 definition 9 REC2XML 199 description 24 UDM xii specifying data by means of 51 idmrlnconv definition 9 Redhat Package Manager xii removing links with 160 UDT xii registering the model conversion SQL, generating from mining definition 9 facility models 23 unfenced mode on AIX systems 159 SQL xii importing models in 169 registration, client tool 11 code sample for applying mining uninstalling IM Scoring Regression/Prediction 18 models 55 on AIX systems 148 result type DB2 states 173 on Linux systems 150 DM_ClasResult generating scripts from mining on Sun Solaris systems 152 functions for working models 23, 44 on Windows systems 156 with 78, 79 query to apply models and uninstalling IM Scoring Java Beans DM_ClusResult compute cluster IDs 29, 30 on AIX systems 162 functions for working statement to define views 29 on Linux systems 163 with 79 states 174 UNIX systems DM_RegResult views, statement to define 29 configuring system environments functions for working steps in installing and configuring on 157 with 80 IM Scoring disabling DB2 instances on 157 results data 53 mandatory 19 enabling DB2 instances on 157 results functions optional 20 using the trace facility on 70 purpose of 56 Sun Solaris systems user-defined data types xii, 10 results values exporting PMML or XML models user-defined functions xii, 11 practice exercise in getting 27 on 159 user-defined methods xii, 10 RPM xii installing IM Scoring Java Beans using on 164 an application function (SQL S installing IM Scoring on 150 command) 55 sample table prerequisites for installing IM IM for Data to produce IDMMX.ClusterModels 25 Scoring Java Beans on 164 models 8 samples prerequisites for installing IM applications, executing 22 Scoring on 150 W components 23 uninstalling IM Scoring on 152 Windows systems for IM Scoring Java Beans, support, getting 66 configuring system environments listed 25 supported mining functions 7 on 158 schema IDMMX 10, 171 configuring the DBMS on 158 enabling DB2 instances on 158 Index 217
  • 233. Windows systems (continued) exporting PMML or XML models on 160 installing IM Scoring Java Beans on 165 installing IM Scoring on 153 prerequisites for installing IM Scoring Java Beans on 165 prerequisites for installing IM Scoring on 153 uninstalling IM Scoring on 156 using the trace facility on 70 worksheet for problem identification 67 X XML xii exporting XML models on AIX systems 158 on Sun Solaris systems 159 on Windows systems 160 importing mining models using a specific encoding 47 218 Administration and Programming for DB2
  • 234. Readers’ Comments — We’d Like to Hear from You IBM DB2 Intelligent Miner Scoring Administration and Programming for DB2 Version 8.1 Publication No. SH12-6745-00 Overall, how satisfied are you with the information in this book? Very Satisfied Satisfied Neutral Dissatisfied Very Dissatisfied Overall satisfaction h h h h h How satisfied are you that the information in this book is: Very Satisfied Satisfied Neutral Dissatisfied Very Dissatisfied Accurate h h h h h Complete h h h h h Easy to find h h h h h Easy to understand h h h h h Well organized h h h h h Applicable to your tasks h h h h h Please tell us how we can improve this book: Thank you for your responses. May we contact you? h Yes h No When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you. Name Address Company or Organization Phone No.
  • 235. _________________________________________________________________________________ Readers’ Comments — We’d Like to Hear from You Cut or Fold Along Line SH12-6745-00 Fold and Tape Please do not staple Fold and Tape __________________________________________________________________________ PLACE POSTAGE STAMP HERE IBM Deutschland Entwicklung GmbH Information Development, Dept. 0446 Schoenaicher Strasse 220 71032 Boeblingen Germany __________________________________________________________________________ Fold and Tape Please do not staple Fold and Tape Cut or Fold SH12-6745-00 Along Line
  • 236. Part Number: CT16INA Program Number: 5765-F36 Printed in Denmark by IBM Danmark A/S (1P) P/N: CT16INA SH12-6745-00
  • 237. Spine information: IBM DB2 Intelligent Miner Scoring Administration and Programming for DB2 Version 8.1