IBM DB2 Intelligent Miner Scoring


Administration and Programming
for DB2
Version 8.1




                               ...
IBM DB2 Intelligent Miner Scoring


Administration and Programming
for DB2
Version 8.1




                               ...
Note
   Before using this information and the product it supports, be sure to read the information in Appendix H,
   “Noti...
Contents
Figures .      .   .   .   .   .   .   .   .   .    .   .   . vii      Chapter 4. Getting started . . . . . . .  ...
Getting application results . . . . . .       .   56   DM_expRegModel . . .           .       .       .       .       .   ...
Appendix A. Installing IM Scoring . . .        145   Installing IM Scoring Java Beans on
Installing IM Scoring on AIX syst...
vi   Administration and Programming for DB2
Figures
1.   The IM Scoring process . . . . . . . 9    3.   Model import processes .   .   .   .   .   . 46
2.   Architect...
viii   Administration and Programming for DB2
Tables
 1.   Formatting conventions . . . . . . xii       14.   Data types specific to IM Scoring       75
 2.   Abbreviat...
x   Administration and Programming for DB2
About this book
                 IBM DB2® Intelligent Miner™ Scoring is an application that integrates the
               ...
The following table shows the formatting conventions used in this book.
                  Table 1. Formatting conventions
...
How this book is structured
             This book is divided into the following parts:
             Part 1. Guide
       ...
v If you can choose from two or more items, they appear in a stack.
                   If you must choose one of the items...
Part 1. Guide
                 This part introduces you to IM Scoring and gives you instructions for its use.
            ...
2   Administration and Programming for DB2
Chapter 1. Introducing the Intelligent Miner products
                 The IBM DB2 Intelligent Miner Version 8.1 is a set ...
Table 3. PMML model types
                  PMML model type                       Mining algorithm
                  Cente...
IM Modeling consists of an SQL API. By using this SQL API, you can build
             Associations, Demographic Clustering...
The Processing functions can be used only on database tables.
                 v Sequential Patterns mining function
     ...
Chapter 2. Introducing IM Scoring
                 This chapter introduces IM Scoring. It describes the functionality prov...
v RBF and Neural Prediction
                 v Polynomial Regression

                 For a short introduction to these m...
Figure 1 shows the process by which a mining model that was built with IM
      for Data is exported from IM for Data, imp...
These database objects are grouped together in the schema IDMMX. To access a
                 UDT, UDF, or UDM, you must s...
If the structured type instance is NULL, the method is not called, and NULL
            is returned.

            User-def...
Online scoring with IM Scoring Java Beans
                 IM Scoring Java Beans can be used to score single or multiple d...
Ease of use
      idmmkSQL
         This new command enables you to generate a sample SQL script from a
         PMML mode...
Platform support
                 There is now support for Windows XP.
       Shared infrastructure with IM Modeling
     ...
The cluster position in the PMML file
   The function DM_getClusterID returns the position of the cluster in the
   PMML f...
16   Administration and Programming for DB2
Chapter 3. Data mining functions
                 This chapter provides a general introduction to the data mining function...
Regression/Prediction
                 The purpose of predicting values is to discover the dependency and the
            ...
Chapter 4. Getting started
                 The aim of this chapter is to get you up and running quickly in using IM
     ...
Optional steps:
                    1. Verifying the installation and configuration
                    2. Executing sampl...
2. Add the contents of the file idmcsstr.add to the idmcsstr.dat file of IM
         for Data server
      On the Windows ...
3. Increase the database parameter APP_CTL_HEAP_SZ. A recommended
                        value is 10000.
                ...
v “Sample components”
           v “Completing the practice exercises” on page 25
     Generating SQL scripts from your ow...
Table 4 and Table 5 on page 25 list the files that are included in the samples
                 directory and explain the ...
Table 5. Sample components for IM Scoring Java Beans
             Sample component                         Description
   ...
First, you must connect to the database. To do this, use the following
                 command:
                 db2 conn...
insert into IDMMX.ClusterModels values
                ( ’DemoBanking’, IDMMX.DM_impClusFile
                (’/usr/lpp/IM...
Note: The column names specified in the call to REC2XML must exactly match
                       the names of fields that...
data type DM_ClusResult. The script then applies the DemoBanking model to
selected data from the banking table by using th...
db2 -stf bankingApplyTable2.db2
                 Contents of the script bankingApplyTable2.db2

                          ...
IDMMX.DM_applData(
                  IDMMX.DM_applData(
                     ’TYPE’, b.type ),
                  ’AGE’, b....
files provided with IM Scoring has a further advantage. It helps you to
                 understand which UDFs and UDMs be...
IDMMX.DM_getNumClusters( MODEL ),
                        IDMMX.DM_getClusMdlSpec( MODEL)
                 FROM IDMMX.Clus...
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Administration and Programming for DB2
Upcoming SlideShare
Loading in …5
×

Administration and Programming for DB2

1,228 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,228
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Administration and Programming for DB2

  1. 1. IBM DB2 Intelligent Miner Scoring Administration and Programming for DB2 Version 8.1 SH12-6745-00
  2. 2. IBM DB2 Intelligent Miner Scoring Administration and Programming for DB2 Version 8.1 SH12-6745-00
  3. 3. Note Before using this information and the product it supports, be sure to read the information in Appendix H, “Notices” on page 207. First Edition, October 2002 This edition applies to Version 8.1 of IBM DB2 Intelligent Miner Scoring, program number 5765–F36, and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2001, 2002. All rights reserved. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
  4. 4. Contents Figures . . . . . . . . . . . . . vii Chapter 4. Getting started . . . . . . . 19 Quick start . . . . . . . . . . . . 19 Tables . . . . . . . . . . . . . . ix Installation . . . . . . . . . . . 20 Configuring IM for Data to export PMML About this book . . . . . . . . . . xi models . . . . . . . . . . . . . 20 Who should use this book . . . . . . . xi Configuring the database environment . . 21 Conventions and terminology used in this Creating database objects . . . . . . 22 book. . . . . . . . . . . . . . . xi Verifying the installation and configuration 22 How this book is structured . . . . . . xiii Executing sample applications . . . . . 22 How to read the syntax diagrams . . . . xiii Generating SQL scripts from your own How to send your comments . . . . . . xiv mining models . . . . . . . . . . 23 Sample components . . . . . . . . . 23 Completing the practice exercises . . . . . 25 Part 1. Guide . . . . . . . . . . 1 Creating a table and importing data . . . 25 Importing a mining model . . . . . . 26 Chapter 1. Introducing the Intelligent Miner Applying a model and getting results products . . . . . . . . . . . . . 3 values . . . . . . . . . . . . . 27 IBM DB2 Intelligent Miner Scoring . . . . . 3 Extracting information from a model. . . 31 IBM DB2 Intelligent Miner Modeling . . . . 4 Applying models created with IM IBM DB2 Intelligent Miner Visualization . . . 5 Modeling . . . . . . . . . . . . 31 IBM DB2 Intelligent Miner for Data . . . . 5 Using IM Scoring Java Beans to score records . . . . . . . . . . . . . 33 Chapter 2. Introducing IM Scoring . . . . 7 Using idmmkSQL to work with your own IM Scoring . . . . . . . . . . . . . 7 mining models . . . . . . . . . . 38 Mining functions supported by IM Scoring 7 Using IM for Data to produce models . . . 8 Chapter 5. Using IM Scoring . . . . . . 41 Using IM Modeling to produce models . . 9 Creating database objects . . . . . . . 41 Scoring data types, methods, and functions 9 Enabling databases. . . . . . . . . 41 PMML: A markup language for data mining 11 Disabling databases . . . . . . . . 42 Converting models . . . . . . . . . . 11 Checking databases . . . . . . . . 43 Online scoring with IM Scoring Java Beans . 12 Working with mining models . . . . . . 43 What is new in version 8.1 . . . . . . . 12 Exporting models from IM for Data . . . 43 Ease of use . . . . . . . . . . . 13 Converting exported models . . . . . 44 E-business enhancements . . . . . . 13 Generating SQL statements from models 44 Functional enhancements . . . . . . 13 Importing mining models . . . . . . 45 Standards conformance . . . . . . . 13 Providing models by means of IM Platform support . . . . . . . . . 14 Modeling . . . . . . . . . . . . 48 Shared infrastructure with IM Modeling . 14 Applying mining models. . . . . . . . 49 Limitations . . . . . . . . . . . 14 Querying model field names . . . . . 49 Using the application functions . . . . 50 Chapter 3. Data mining functions . . . . 17 Specifying data by means of REC2XML . . 51 Classification. . . . . . . . . . . . 17 Specifying data by means of DM_applData 52 Clustering. . . . . . . . . . . . . 17 Specifying data by means of CONCAT . . 53 Regression/Prediction. . . . . . . . . 18 Results data . . . . . . . . . . . 53 Code sample for applying models. . . . 55 © Copyright IBM Corp. 2001, 2002 iii
  5. 5. Getting application results . . . . . . . 56 DM_expRegModel . . . . . . . . . . 99 Handling missing values . . . . . . . 57 DM_getClasCostRate. . . . . . . . . 100 Using IM Scoring Java Beans . . . . . . 58 DM_getClasMdlName . . . . . . . . 101 Setting environment variables . . . . . 59 DM_getClasMdlSpec . . . . . . . . . 102 Specifying the mining model to be used . 60 DM_getClasTarget . . . . . . . . . 103 Accessing model metadata . . . . . . 61 DM_getClusConf . . . . . . . . . . 104 Specifying a data record . . . . . . . 62 DM_getClusMdlName . . . . . . . . 105 Applying scoring . . . . . . . . . 62 DM_getClusMdlSpec. . . . . . . . . 106 Accessing computed results . . . . . . 62 DM_getClusScore . . . . . . . . . . 107 Scoring example . . . . . . . . . 63 DM_getClusterID . . . . . . . . . . 108 ScoringException classes . . . . . . . 64 DM_getClusterName. . . . . . . . . 109 DM_getConfidence . . . . . . . . . 110 Chapter 6. Administrative tasks . . . . . 65 DM_getNumClusters. . . . . . . . . 111 Using IM Scoring in a multilanguage DM_getPredClass . . . . . . . . . . 112 environment . . . . . . . . . . . . 65 DM_getPredValue. . . . . . . . . . 113 Getting error information . . . . . . . 65 DM_getQuality . . . . . . . . . . 114 Getting support . . . . . . . . . . . 66 DM_getQuality(clusterid) . . . . . . . 115 Product README . . . . . . . . . 66 DM_getRBFRegionID . . . . . . . . 116 'Frequently asked questions' and 'Hints DM_getRegMdlName . . . . . . . . 117 and tips' . . . . . . . . . . . . 67 DM_getRegMdlSpec . . . . . . . . . 118 Problem identification worksheet . . . . 67 DM_getRegTarget . . . . . . . . . . 119 Getting product information . . . . . 68 DM_impApplData . . . . . . . . . 120 Getting trace information . . . . . . 69 DM_impClasFile . . . . . . . . . . 121 Getting DB2 diagnostic information . . . 71 DM_impClasFileE. . . . . . . . . . 122 DM_impClasModel . . . . . . . . . 123 DM_impClusFile . . . . . . . . . . 124 Part 2. Reference . . . . . . . . 73 DM_impClusFileE . . . . . . . . . 125 DM_impClusModel . . . . . . . . . 127 Chapter 7. Overview of IM Scoring DM_impRegFile . . . . . . . . . . 128 database objects . . . . . . . . . . 75 DM_impRegFileE . . . . . . . . . . 129 Data types provided by IM Scoring . . . . 75 DM_impRegModel . . . . . . . . . 130 Methods provided by IM Scoring . . . . . 77 Functions provided by IM Scoring . . . . 77 Chapter 10. IM Scoring command Parameter sizes . . . . . . . . . . . 81 reference . . . . . . . . . . . . 131 The idmcheckdb command . . . . . . 132 Chapter 8. IM Scoring methods reference 83 The idmdisabledb command . . . . . . 132 DM_expDataSpec . . . . . . . . . . 84 The idmenabledb command . . . . . . 133 DM_getFldName . . . . . . . . . . 85 The idminstfunc command. . . . . . . 135 DM_getFldType . . . . . . . . . . . 86 The idmlevel command . . . . . . . . 135 DM_getNumFields . . . . . . . . . . 87 The idmlicm command . . . . . . . . 135 DM_impDataSpec . . . . . . . . . . 88 The idmmkSQL command . . . . . . . 136 DM_isCompatible . . . . . . . . . . 89 The idmuninstfunc command . . . . . . 138 The idmxmod command . . . . . . . 139 Chapter 9. IM Scoring functions reference 91 DM_applData . . . . . . . . . . . 92 Chapter 11. IM Scoring Java Beans DM_applyClasModel . . . . . . . . . 94 reference . . . . . . . . . . . . 141 DM_applyClusModel . . . . . . . . . 95 DM_applyRegModel . . . . . . . . . 96 DM_expClasModel. . . . . . . . . . 97 Part 3. Appendixes . . . . . . . 143 DM_expClusModel . . . . . . . . . 98 iv Administration and Programming for DB2
  6. 6. Appendix A. Installing IM Scoring . . . 145 Installing IM Scoring Java Beans on Installing IM Scoring on AIX systems . . . 145 Windows systems . . . . . . . . . . 165 Prerequisites for AIX systems . . . . . 145 Prerequisites for Windows systems . . . 165 Installing IM Scoring. . . . . . . . 146 Installing IM Scoring Java Beans . . . . 165 Uninstalling IM Scoring. . . . . . . 148 Installing IM Scoring on Linux systems . . 149 Appendix C. Migration from IM Scoring Prerequisites for Linux systems . . . . 149 V7.1 . . . . . . . . . . . . . . 167 Installing IM Scoring. . . . . . . . 149 Working with IM Scoring V7.1 and V8.1 in Uninstalling IM Scoring. . . . . . . 150 parallel . . . . . . . . . . . . . 167 Installing IM Scoring on Sun Solaris systems 150 Exporting and importing models with the Prerequisites for Sun Solaris systems . . 150 use of compression . . . . . . . . . 168 Installing IM Scoring. . . . . . . . 151 Exporting and importing models by means Uninstalling IM Scoring. . . . . . . 152 of DB2 Utilities . . . . . . . . . . 168 Installing IM Scoring on Windows systems 153 Importing models in unfenced mode . . . 169 Prerequisites for Windows systems . . . 153 Applying Neural models . . . . . . . 169 Installing IM Scoring. . . . . . . . 154 Using the function DM_getClusterID . . . 170 Uninstalling IM Scoring. . . . . . . 156 Configuring the database management Appendix D. Coexistence with IM system on UNIX systems . . . . . . . 157 Modeling . . . . . . . . . . . . 171 Enabling the DB2 instance on UNIX Shared schema . . . . . . . . . . . 171 systems . . . . . . . . . . . . 157 Shared data types . . . . . . . . . . 171 Disabling the DB2 instance on UNIX Shared functions . . . . . . . . . . 171 systems . . . . . . . . . . . . 157 Shared methods . . . . . . . . . . 172 Configuring the database management Shared commands . . . . . . . . . 172 system on Windows systems . . . . . . 158 Enabling the DB2 instance on Windows Appendix E. Error messages . . . . . 173 systems . . . . . . . . . . . . 158 DB2 SQL states . . . . . . . . . . 173 Enabling IM for Data to export PMML or IM Scoring SQL states . . . . . . . . 174 XML models . . . . . . . . . . . 158 IM Scoring error events . . . . . . . . 174 On AIX systems . . . . . . . . . 158 On Sun Solaris systems . . . . . . . 159 Appendix F. The DB2 REC2XML function 199 On Windows systems . . . . . . . 160 Appendix G. IM Scoring conformance to Appendix B. Installing IM Scoring Java PMML . . . . . . . . . . . . . 203 Beans . . . . . . . . . . . . . 161 IM Scoring application . . . . . . . . 203 Installing IM Scoring Java Beans on AIX IM Scoring conversion tools . . . . . . 204 systems . . . . . . . . . . . . . 161 Radial-Basis Function prediction . . . . . 205 Prerequisites for AIX systems . . . . . 161 Installing IM Scoring Java Beans . . . . 161 Appendix H. Notices . . . . . . . . 207 Uninstalling IM Scoring Java Beans . . . 162 Trademarks . . . . . . . . . . . . 209 Installing IM Scoring Java Beans on Linux systems . . . . . . . . . . . . . 162 Bibliography and related information . . 211 Prerequisites for Linux systems . . . . 163 IBM DB2 Intelligent Miner publications . . 211 Installing IM Scoring Java Beans . . . . 163 IBM DB2 Universal Database (DB2 UDB) Uninstalling IM Scoring Java Beans . . . 163 publications. . . . . . . . . . . . 212 Installing IM Scoring Java Beans on Sun Related information . . . . . . . . . 212 Solaris systems. . . . . . . . . . . 164 Prerequisites for Sun Solaris systems . . 164 Installing IM Scoring Java Beans . . . . 164 Index . . . . . . . . . . . . . 213 Uninstalling IM Scoring Java Beans . . . 164 Contents v
  7. 7. vi Administration and Programming for DB2
  8. 8. Figures 1. The IM Scoring process . . . . . . . 9 3. Model import processes . . . . . . 46 2. Architecture sample to realize a 4. Applying a model to data . . . . . 55 call-center scenario . . . . . . . . 12 © Copyright IBM Corp. 2001, 2002 vii
  9. 9. viii Administration and Programming for DB2
  10. 10. Tables 1. Formatting conventions . . . . . . xii 14. Data types specific to IM Scoring 75 2. Abbreviations . . . . . . . . . xii 15. Methods for type DM_LogicalDataSpec 77 3. PMML model types . . . . . . . . 4 16. Functions for working with scoring data 4. Sample components for the Clustering type DM_ApplicationData . . . . . 78 mining function of IM Scoring . . . . 24 17. Functions for working with data mining 5. Sample components for IM Scoring Java model type DM_ClasModel . . . . . 78 Beans . . . . . . . . . . . . 25 18. Functions for working with scoring 6. Import functions and related data types result type DM_ClasResult . . . . . 78 and tables . . . . . . . . . . . 46 19. Functions for working with scoring 7. Import functions using a specific XML result type DM_ClusResult . . . . . 79 encoding . . . . . . . . . . . 47 20. Functions for working with data mining 8. Import functions using CLOB values 48 model type DM_ClusteringModel . . . 79 9. Functions for applying models . . . . 50 21. Functions for working with data mining 10. Application functions and their data model type DM_RegressionModel . . . 80 types and results data . . . . . . . 53 22. Functions for working with scoring 11. Results functions and their purpose 56 result type DM_RegResult . . . . . 80 12. IM Scoring Java Beans methods for 23. Mining field types . . . . . . . . 86 accessing model metadata . . . . . 61 24. The idmcheckdb messages . . . . . 132 13. IM Scoring Java Beans methods for accessing computed results . . . . . 62 © Copyright IBM Corp. 2001, 2002 ix
  11. 11. x Administration and Programming for DB2
  12. 12. About this book IBM DB2® Intelligent Miner™ Scoring is an application that integrates the model application functionality of Intelligent Miner for Data Version 6.1 or higher with the DB2 Universal Database™. Intelligent Miner Scoring enables you to import and apply mining models, and to access the results. Throughout this book, the following abbreviations are used: v IBM DB2 Intelligent Miner Scoring V8.1 is referred to as IM Scoring. v IBM DB2 Intelligent Miner Scoring V7.1 is referred to as IM Scoring V7.1. v IBM DB2 Intelligent Miner Modeling V8.1 is referred to as IM Modeling. v IBM DB2 Intelligent Miner Visualization V8.1 is referred to as IM Visualization. v IBM DB2 Intelligent Miner for Data is referred to as IM for Data. This book describes how to install and use IM Scoring and IM Scoring Java Beans. This book also provides a full reference resource to the database objects provided by IM Scoring. References in this book to DB2 refer to DB2 UDB Version 7.2 or higher. Who should use this book This book is intended for the following users: v DB2 database administrators who are familiar with DB2 administration concepts, tools, and techniques v Users of IM for Data who are familiar with the concepts underlying the different data mining functions that IM for Data provides v DB2 application programmers who are familiar with SQL and with one or more programming languages that can be used for DB2 applications Conventions and terminology used in this book In DB2, the names of the scoring methods, functions, data types, tables, and table columns are created in capital letters, even if you used, for example, lowercase letters. In this book, these names are represented in mixed case for better readability. © Copyright IBM Corp. 2001, 2002 xi
  13. 13. The following table shows the formatting conventions used in this book. Table 1. Formatting conventions Convention used How it is used Interface elements, for example, menu Click OK. bars, buttons, and labels are shown in boldface. Menu instructions are shown in boldface Click File —> Export. and sequential instructions are separated by arrows. Command syntax is shown in a db2 -stf idmtab.db2 monospaced font. The names of the following are shown in The SQL INSERT command inserts the a monospaced font: model into a column of the table v Files and directories ClusterModels, which is configured for the data type DM_ClusteringModel. v Database tables and columns v SQL methods, functions, and data types Variables within command syntax, which idmdisabledb <db name> you should replace by a real value, are shown in italics between angle brackets. Italics are used to highlight the These functions are also referred to as introduction of a new term. user-defined functions. The following table shows the abbreviations used in this book. Table 2. Abbreviations Abbreviation Full form CRM Customer Relationship Management GUI Graphical user interface ICU International Classes for Unicode PMML Predictive Model Markup Language RBF Radial Basis Function RPM Redhat Package Manager SQL Structured Query Language UDF User-defined function UDM User-defined method UDT User-defined data type XML Extensible Markup Language xii Administration and Programming for DB2
  14. 14. How this book is structured This book is divided into the following parts: Part 1. Guide Contains the following: v An overview of the functionality available with IM Scoring v Instructions on how to get started with IM Scoring v Guidance on how to use IM Scoring and how to perform administrative tasks Part 2. Reference Provides a reference resource to all the IM Scoring database objects and utilities. Part 3. Appendixes Contains the following: v Instructions on how to install, configure, and uninstall IM Scoring and IM Scoring Java Beans v Information on migration issues from IM Scoring V7.1 and on conformance with PMML v Instructions on using the DB2 function REC2XML v Information about the error messages produced by IM Scoring How to read the syntax diagrams In the reference part of this book, the syntax for IM Scoring’s functionality is described using the following structure: v Read the syntax diagrams from left to right and top to bottom, following the path of the line. The ─── symbol indicates the beginning of a statement. The ─── symbol indicates that the statement syntax is continued on the next line. The ─── symbol indicates that a statement is continued from the previous line. The ── symbol indicates the end of a statement. v Required items appear on the horizontal line (the main path). required item v Optional items appear below the main path. optional item About this book xiii
  15. 15. v If you can choose from two or more items, they appear in a stack. If you must choose one of the items, one item of the stack appears on the main path. required choice1 required choice2 If choosing none of the items is an option, the entire stack appears below the main path. optional choice1 optional choice2 A repeat arrow above a stack indicates that you can make more than one choice from the stacked items. optional choice1 optional choice2 v Keywords must be spelled exactly as shown. Variables appear in lowercase letters (for example, encoding name). They represent names or values that you must supply. v If punctuation marks, parentheses, arithmetic operators, or other such symbols are shown, you must enter them as part of the syntax. How to send your comments Your feedback is important in helping us to provide you with the most accurate and high-quality information possible. If you have any comments about this book: v Send your comments by e-mail to swsdid@de.ibm.com. Be sure to include the name and part number of the book, and to say which version of IM Scoring you are using. If applicable, include the specific location of the text you are commenting on. For example, give a page number or table number. v Fill out the Readers’ Comments form at the back of this book. Return it by mail, by fax, or by giving it to an IBM representative. The mailing address is on the back of the form. The fax number is +49-(0)7031-16-4892. xiv Administration and Programming for DB2
  16. 16. Part 1. Guide This part introduces you to IM Scoring and gives you instructions for its use. v For an overview of the Intelligent Miner family of products, see Chapter 1, “Introducing the Intelligent Miner products” on page 3. v For an overview of IM Scoring, see Chapter 2, “Introducing IM Scoring” on page 7. v For a quick overview of what you need to do to get up and running with IM Scoring, see Chapter 4, “Getting started” on page 19. This chapter also contains a tutorial in the form of sample exercises. v For full instructions in the use of IM Scoring, see Chapter 5, “Using IM Scoring” on page 41. v For instructions on doing a number of administrative tasks connected with IM Scoring, see Chapter 6, “Administrative tasks” on page 65. © Copyright IBM Corp. 2001, 2002 1
  17. 17. 2 Administration and Programming for DB2
  18. 18. Chapter 1. Introducing the Intelligent Miner products The IBM DB2 Intelligent Miner Version 8.1 is a set of the following products: v Intelligent Miner Scoring v Intelligent Miner Modeling v Intelligent Miner Visualizing These products support rapid enablement of Intelligent Miner analytics embedded in Business Intelligence (BI), eCommerce, or traditional OLTP application programs. v You can use IM Scoring to deploy PMML models that were created by one of the Intelligent Miner products or by other applications and tools that support interoperability through the use of PMML models. v You can use IM Modeling to build data mining models. v You can use IM Visualizing to browse PMML models that are created by one of the Intelligent Miner products or by other applications and tools that support interoperability through the use of PMML models. PMML is a standard format for data mining models. Based on XML, PMML provides a standard that enables data mining models to be shared between the applications of different vendors. The intention is to provide a vendor-independent method of defining models. In this way, proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. You can find more information about PMML on the Web site of the Data Mining Group (DMG) at http://www.dmg.org. IBM DB2 Intelligent Miner Scoring IM Scoring provides scoring technology as database extenders, DB2 extenders, and Oracle cartridges. It enables application programs to apply PMML models to large databases, subsets of databases, or single rows or cases. Application programs use the SQL API, which consists of user-defined functions (UDFs) and user-defined methods (UDMs), to perform the scoring operation. The PMML models might have been created by one of the Intelligent Miner products or by other applications and tools that support interoperability through the use of PMML models. The following table shows the different PMML models that can be applied by different mining algorithms. © Copyright IBM Corp. 2001, 2002 3
  19. 19. Table 3. PMML model types PMML model type Mining algorithm Center-based clustering Neural Clustering algorithm Distribution-based clustering Demographic Clustering algorithm Neural networks Neural Classification algorithm, Neural Prediction algorithm Decision tree Tree Classification algorithm Regression Logistic Regression algorithm, Polynomial Regression algorithm, Linear Regression algorithm Additionally, IM Scoring supports models that are built by the RBF Prediction algorithm of the Intelligent Miner for Data. These models are not yet part of PMML. You can export these models in XML format from the Intelligent Miner for Data and use them with IM Scoring. Mining models that are applied by the SQL API of IM Scoring must be contained in database tables. If the mining models are created by means of IM Modeling, they can be directly applied because IM Modeling writes the models into database tables. If the mining models are created by means of the Intelligent Miner for Data, they must be exported from the Intelligent Miner for Data and imported into database tables. IM Scoring provides UDFs to import the models. You can also apply PMML V1.1 or PMML V2.0 models that are created with tools from different vendors. IM Scoring provides a feature called Single Record Scorer. The Single Record Scorer consists of a Java API. You can use this feature to score single or multiple data records against a mining model that is contained in a flat file. The Single Record Scorer is designed for applications where the online scoring of data records is the main task. IBM DB2 Intelligent Miner Modeling IM Modeling provides IM modeling technology as DB2 extenders. It enables SQL application programs to call associations discovery, clustering, and classification operations to develop analytic models based on data accessed by DB2 Universal Database Version 7 or Version 8 SQL. The resulting models are in PMML V2.0 format. They can be processed by IM Scoring or IM Visualizing. 4 Administration and Programming for DB2
  20. 20. IM Modeling consists of an SQL API. By using this SQL API, you can build Associations, Demographic Clustering, and Tree Classification PMML models that are stored in DB2 tables. The data mining functions are based on the mining functions included in the Intelligent Miner for Data. IBM DB2 Intelligent Miner Visualization IM Visualizing provides the following JAVA visualizers to present data modeling results for analysis: v Associations Visualizer v Classification Visualizer v Clustering Visualizer You can use the Intelligent Miner Visualizers to visualize PMML-conforming mining models. Applications can call these visualizers to present model results, or you can deploy the visualizers as applets in a Web browser for ready dissemination. The models might have been developed by using IM Modeling or other applications and tools that support interoperability through the use of PMML models, or models of the Intelligent Miner for Data might have been exported as PMML models. The Intelligent Miner Visualizers are included in Intelligent Miner for Data Version 8.1. IBM DB2 Intelligent Miner for Data Intelligent Miner for Data Version 8.1 is an independent product that provides the following mining functions to build and apply mining models based on database or flat file data: v Associations mining function v Classification mining function including the following algorithms: – Neural Classification – Tree Classification v Clustering mining function including the following algorithms: – Demographic Clustering – Neural Clustering v Prediction mining functions including the following algorithms: – Neural Prediction – Polynomial Regression – RBF Prediction v Processing functions Chapter 1. Introducing the Intelligent Miner products 5
  21. 21. The Processing functions can be used only on database tables. v Sequential Patterns mining function v Similar Sequences mining function v Statistics functions Version 8 of the Intelligent Miner for Data includes the Intelligent Miner Visualizers. It also includes the PMML conversion component of IM Scoring, which allows you to export mining models in PMML format. 6 Administration and Programming for DB2
  22. 22. Chapter 2. Introducing IM Scoring This chapter introduces IM Scoring. It describes the functionality provided by IM Scoring, and provides information about PMML and model conversion. This chapter also describes what is new in IM Scoring V8.1. IM Scoring IM Scoring is an add-on service to DB2 that extends the capabilities of DB2 to include data mining functions. Mining models continue to be built through the use of the following tools: IM for Data This produces models that can be exported as PMML models. IM Modeling This provides mining models in PMML 2.0 format. Other tools Any other tool that provides mining models in PMML 1.1 or PMML 2.0 format. You can use the IM Scoring functionality to import certain types of mining models into a DB2 table, to apply the models to data within DB2, and to access the results. This functionality comprises the scoring functions of IM Scoring. The results of applying the model are referred to as scoring results. These results differ in content according to the type of model applied. IM Scoring includes functions to retrieve the values of scoring results. IM Scoring is available on the following operating systems: v AIX® v Linux v Sun Solaris v Windows NT®, Windows® 2000, Windows XP Mining functions supported by IM Scoring IM Scoring supports the application mode for the following IM for Data mining and statistical functions: v Demographic and Neural Clustering v Tree and Neural Classification © Copyright IBM Corp. 2001, 2002 7
  23. 23. v RBF and Neural Prediction v Polynomial Regression For a short introduction to these mining functions, see Chapter 3, “Data mining functions” on page 17. IM Scoring supports the application of the following models created by IM Modeling: v Demographic Clustering v Tree Classification For descriptions of these mining models, see the IM Modeling documentation, IM Modeling Administration and Programming. In this guide, Chapter 3, “Data mining functions” on page 17 also contains brief introductory information about mining models. In addition, IM Scoring supports the application of Logistic Regression models. Within IM Scoring, the mining functions are grouped into the mining types Clustering, Classification, and Regression as follows: v Clustering includes Demographic and Neural Clustering v Classification includes Tree and Neural Classification v Regression includes RBF Prediction, Neural Prediction, Polynomial Regression, and Logistic Regression Scoring functions are provided to work with each of these types. Each scoring function includes different algorithms to deal with the different mining functions included within a type. For example, the Clustering type includes Demographic and Neural Clustering; thus, scoring functions for Clustering include algorithms for demographic and neural clustering. Using IM for Data to produce models For all the mining functions that are supported, except Logistic Regression, you can build and store the models by using IM for Data, which supports PMML models. A model must then be exported to an external file. To use the IM Scoring mining functions: v Import the mining model into a DB2 table, where it is stored as a large object v Apply the model to data stored in DB2 tables v Store scoring results in a DB2 table v Extract information about the results, for example, the cluster ID and score 8 Administration and Programming for DB2
  24. 24. Figure 1 shows the process by which a mining model that was built with IM for Data is exported from IM for Data, imported into a DB2 database, and applied to selected data. Figure 1. The IM Scoring process Using IM Modeling to produce models You can use IM Modeling to create models for the mining functions that it supports; these are Demographic Clustering and Tree Classification. The models that IM Modeling creates reside in a DB2 table. These models are in a format that enables IM Scoring to apply them directly. Scoring data types, methods, and functions The database objects supplied with IM Scoring consist of the following: v User-defined data types (UDTs) v User-defined functions (UDFs) v User-defined methods (UDMs) Chapter 2. Introducing IM Scoring 9
  25. 25. These database objects are grouped together in the schema IDMMX. To access a UDT, UDF, or UDM, you must specify its fully-qualified name, for example, data type IDMMX.DM_ClusteringModel. Part 2, “Reference” on page 73 supplies overview lists and full descriptions of all the database objects supplied with IM Scoring. User-defined data types The user-defined data types are used for identifying and storing mining models and results in DB2 tables. User-defined data types are also referred to as user-defined types or UDTs. The user-defined data types provided by IM Scoring consist of distinct types and structured types. Distinct types The following user-defined types are distinct types in IM Scoring: v DM_ApplicationData v DM_ClasModel, DM_ClusteringModel, DM_RegressionModel v DM_ClasResult, DM_RegResult, DM_ClusResult Structured type The following user-defined type is a structured type in IM Scoring: DM_LogicalDataSpec User-defined methods Use user-defined methods to create or modify user-defined structured types. You can call the methods that are defined for a type by using either a method syntax or a function syntax. Method syntax To call, or invoke, a method using the method syntax: v In an appropriate context, specify the method name preceded by both a reference to a structured type instance, and the double dot operator. v Follow this with the list of arguments enclosed in parentheses. Example: select IDMMX.DM_getClusMdlSpec(modelcolumn)..DM_getNumFields()... Function syntax To call, or invoke, a method using the function syntax: In an appropriate context, specify the method name followed by, in parentheses, the structured type instance and the list of arguments. Example: select IDMMX.DM_getNumFields( IDMMX.DM_getClusMdlSpec(modelcolumn) )... 10 Administration and Programming for DB2
  26. 26. If the structured type instance is NULL, the method is not called, and NULL is returned. User-defined functions IM Scoring provides scoring functions, also referred to as user-defined functions (UDFs), which enable you to: v Import and export mining models, and access the properties of the models. v Apply these models to data held in DB2 tables. v Retrieve the results. Function syntax The function syntax is described in ’Function syntax’ in “User-defined methods” on page 10. PMML: A markup language for data mining PMML is a standard format for data mining models. Based on XML, the PMML format provides a standard that enables data mining models to be shared between the applications of different vendors. The intention is to provide a vendor-independent method for defining models. In this way, proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. You can find more information on PMML on the Web site of the Data Mining Group (DMG) at http://www.dmg.org. Converting models IM Scoring provides a model conversion facility, which converts mining models from IM for Data format to PMML 2.0 format. The model conversion facility respects the current server locale and writes the appropriate XML encoding into the PMML model. Additionally, IM Scoring provides the features that are required to register the model conversion facility with IM for Data by using the client tool registration facility of IM for Data. You can use the model conversion facility by selecting the PMML format in the Export dialog of the IM for Data GUI when you export the model. If you import models created by IM for Data into DB2, you do not need to convert the models to PMML 2.0. The model import functions read models in PMML 1.1, PMML 2.0, or Intelligent Miner format. Importing V6 models works only in fenced mode; for further details, see “Importing models in unfenced mode” on page 169. Chapter 2. Introducing IM Scoring 11
  27. 27. Online scoring with IM Scoring Java Beans IM Scoring Java Beans can be used to score single or multiple data records using a specified mining model. IM Scoring Java Beans is designed to be used for applications where the online scoring of data records is the main task. A possible application area of IM Scoring Java Beans might be the realization of an Internet-based call center scenario. In this scenario, the required business logic – in this case the scoring functions – runs on a Web or application server. Clients can connect to the server and send to it a data record that was specified by a call-center operator by means of a user interface on the client. The data record is scored on the server, and the result is passed back to the client in real time. Figure 2 shows a simplified design, illustrating how such a scenario could be realized using IM Scoring Java Beans. Here, IM Scoring Java Beans is integrated into a J2EE implementation using, for example, servlets or Enterprise Java™ Beans. Figure 2. Architecture sample to realize a call-center scenario Note: To get optimum performance throughput, you might decide to run each mining model in a separate process. In this case, you would pass only the new records to the appropriate scoring process. This results in a considerable performance improvement. The reason for the improvement is that the model-loading step, which is very time-consuming, is done only once. What is new in version 8.1 This section introduces you to the new features in IM Scoring. 12 Administration and Programming for DB2
  28. 28. Ease of use idmmkSQL This new command enables you to generate a sample SQL script from a PMML model. You can then use this sample script as a template to invoke IM Scoring on the model. Improved samples The IM Scoring samples have been enhanced and reworked, and they demonstrate how to use the new DB2 built-in function REC2XML. This simplifies SQL statements and improves performance. E-business enhancements Java support for Realtime Scoring The new JAVA interface, IM Scoring Java Beans, enables you to integrate real-time scoring into e-business applications, for example, those used in CRM. Functional enhancements Model compression Models are now compressed when they are imported into the database. This results in reduced resource consumption (database size) and improved performance. Models that were imported by means of IM Scoring V7 can be compressed through the use of export and import functions. For details, see “Exporting and importing models with the use of compression” on page 168. New methods to work with mining fields The new structured type DM_LogicalDataSpec contains information about the mining fields that are part of the input data used to apply models. This information includes the field name and field type definitions of the mining fields. A number of new methods are supported for DM_LogicalDataSpec: for details, see “Methods provided by IM Scoring” on page 77. Additional functions Additional functions have been added to extract properties from a data mining model and from a scoring result. Standards conformance PMML 2.0 support The IM Scoring conversion utilities now generate PMML 2.0 models. For more information about PMML, see http://www.dmg.org. IM Scoring now accepts PMML 2.0 models in addition to the PMML 1.1 models generated by IM Scoring V7.1. For detailed information about how IM Scoring conforms to PMML, see Appendix G, “IM Scoring conformance to PMML” on page 203. Chapter 2. Introducing IM Scoring 13
  29. 29. Platform support There is now support for Windows XP. Shared infrastructure with IM Modeling IM Scoring shares common infrastructure like XML parsing, error handling, tracing, licensing, and diagnostics with IM Modeling. This causes some changes in administrative interfaces. Installation directory IM Scoring uses a new default installation directory prefix, IMinerX, instead of IMinerSc as used in IM Scoring V7.1. The utilities idmenabledb, idmdisabledb, and idmcheckdb The commands to enable and disable a database for IM Scoring have been improved, and are shared between IM Scoring and IM Modeling. The idmcheckdb utility is a new tool that checks the enablement status of a database. Collecting diagnostic information The tracing infrastructure has been improved. New environment variables enable you to customize the degree of tracing information. A new tool, idmlevel, enables you to check which version of IM Scoring you are using. License use management IM Scoring uses nodelock keys to check whether a valid license is available. The ’Try and Buy’ version installs a temporary key. This key allows you to use the product for a limited period of time in accordance with the EULA valid for the ’Try and Buy’ version. The new command idmlicm lets you check the license status. Limitations The following limitations exist in version 8.1. v Importing models from IM for Data Models in Intelligent Miner format It is no longer possible to import these models in unfenced mode. To import models of this kind, do one of the following: v Enable the database in fenced mode. v Use the model conversion utility idmxmod to convert the model to PMML before importing it. v Neural PMML models generated by IM Scoring V7.1 You might have existing models that were generated using the neural kernels of IM for Data V6 or higher in an IM Scoring database. Models of this kind must be migrated by importing them again. 14 Administration and Programming for DB2
  30. 30. The cluster position in the PMML file The function DM_getClusterID returns the position of the cluster in the PMML file. This is different from the behavior in IM Scoring V7.1. For details, see “Using the function DM_getClusterID” on page 170. Chapter 2. Introducing IM Scoring 15
  31. 31. 16 Administration and Programming for DB2
  32. 32. Chapter 3. Data mining functions This chapter provides a general introduction to the data mining functions that can be used with IM Scoring. The generation and application of mining models is described. Note that IM Scoring supports only the application of these models. The mining functions are described in the following sections: v “Classification” v “Clustering” v “Regression/Prediction” on page 18 Classification Classification is the process of automatically creating a model of classes from a set of records that contain class labels. The classification technique analyzes records that are already known to belong to a certain class, and creates a profile for a member of that class from the common characteristics of the records. You can then use a data mining application tool to apply this model to new records, that is, records that have not yet been classified. This enables you to predict if the new records belong to that particular class. When a model is applied, IM Scoring assigns a class label and a confidence value to each individual record being scored. Clustering The clustering technique consists of a range of algorithms that group data records on the basis of how similar they are. For example, a data record might consist of a description of customers. In this case, clustering would group similar customers together, and at the same time it would maximize the differences between the different customer groups formed in this way. The groups that are found are known as clusters. Each cluster tells a specific story about customer identity or behavior, for example, about their demographic background, or about their preferred products or product combinations. In this way, customers can be grouped in homogeneous groups that are very similar to each other. When a model is applied, IM Scoring assigns a cluster ID, a cluster score, a quality value, and a confidence value to each individual record being scored. The cluster score, quality value, and confidence value are different measures that indicate how well the record fits into the assigned cluster. © Copyright IBM Corp. 2001, 2002 17
  33. 33. Regression/Prediction The purpose of predicting values is to discover the dependency and the variation of one field’s value on the values of the other fields within the same record. A model is generated that can predict a value for that particular field in a new record of the same form, based on other field values. For example, a retailer wants to use historical data to estimate the sales revenue for a new customer. A mining run on this historical data creates a model. This model can be used to predict the expected sales revenue for a new customer, based on the new customer’s data. The model might also show that, for some customers, incentive campaigns improve sales. In addition, it might reveal that frequent visits by sales representatives lead to a lower revenue if the customer is young. When a model is applied, IM Scoring assigns a predicted value and, for an RBF model, a region ID to each individual cluster being scored. 18 Administration and Programming for DB2
  34. 34. Chapter 4. Getting started The aim of this chapter is to get you up and running quickly in using IM Scoring. v First, there is a quick-start guide. Here, you review the tasks that you need to complete to get started. See “Quick start”. v This is followed by sections that help you to gain confidence in using the IM Scoring mining functions. These sections guide you through a tutorial of practice exercises on sample data. By using the data and scripts provided with the IM Scoring package and the instructions given in these sections, you can do the following: – Import and store a sample mining model – Apply it to sample data – Obtain results – Extract information from a model – Apply models created with IM Modeling – Use IM Scoring Java Beans to score records All the tasks in the practice exercises are completed by means of sample scripts. The scripts include standard SQL commands, such as INSERT, and scoring functions such as DM_impClusFile. The contents of the scripts are given in this chapter so that you can see how the SQL statements are structured. You can use these sample scripts as a basis for your own scripts. See “Sample components” on page 23 and “Completing the practice exercises” on page 25. Quick start This chapter guides you through the steps necessary to install and configure IM Scoring successfully. It gives you brief hints on what to do, and points you to the appropriate sections in this guide that describe each step in detail. Some steps are mandatory, and some steps are optional. Mandatory steps: 1. Installation 2. Configuration 3. Creating database objects © Copyright IBM Corp. 2001, 2002 19
  35. 35. Optional steps: 1. Verifying the installation and configuration 2. Executing sample applications If you have IM Scoring V7.1 installed and configured, first check any migration issues. For further information, see Appendix C, “Migration from IM Scoring V7.1” on page 167. Installation Install IM Scoring by using the usual installation tools. The IM Scoring CD-ROM contains subdirectories for each platform that is supported. To install IM Scoring, insert the CD-ROM into your CD-ROM drive, and change to the appropriate subdirectory. For each platform, different setup programs (Windows), installp images (AIX), or installation scripts (SUN and Linux) are provided. These enable you to install the various components of IM Scoring (Scoring, Conversion, IM Scoring Java Beans). For full instructions on installing all the components of IM Scoring, configuring the database management system, and uninstalling IM Scoring, see: v Appendix A, “Installing IM Scoring” on page 145 v “Installing IM Scoring Java Beans” on page 164 The conversion component and the Scoring component need additional configuration steps before they are ready to use. For information about the mandatory steps needed to configure the conversion component, see “Configuring IM for Data to export PMML models”. For information about the other mandatory steps that you must follow before you can use the Scoring component, see: v “Configuring the database environment” on page 21 v “Creating database objects” on page 22 For information about the optional steps that you can perform for the Scoring component, see: v “Verifying the installation and configuration” on page 22 v “Executing sample applications” on page 22 Configuring IM for Data to export PMML models After you have installed the conversion component on the AIX or SUN Solaris platform, you need to register the conversion utilities. To do this: 1. Add the contents of the file idmcsctr.add to the idmcsctr.dat file of the IM for Data client 20 Administration and Programming for DB2
  36. 36. 2. Add the contents of the file idmcsstr.add to the idmcsstr.dat file of IM for Data server On the Windows platform, these steps are done automatically during installation. It must be done manually only if you install IM for Data after you have installed the conversion component. For more information, see “Enabling IM for Data to export PMML or XML models” on page 158 in Appendix A, “Installing IM Scoring”. The information there will help you also if you are running an IM for Data client in a language other than English. Configuring the database environment After you have installed the Scoring component, you need to configure your DB2 instance and the databases that you want to use with IM Scoring. To configure the DB2 instance as a user with SYSADM authority: v On UNIX® platforms, call the idminstfunc script. This is available in the bin directory of your IM Scoring installation. v On all platforms, increase the database manager configuration parameter UDF_MEM_SZ. A recommended value is 60000, which is the highest possible. Syntax db2 update dbm cfg using UDF_MEM_SZ 60000 v On Windows platforms, increase the DB2 registry parameter DB2NTMEMSIZE to a value that matches your UDF_MEM_SZ value. Syntax db2set DB2NTMEMSIZE=APLD:240000000 v Restart the DB2 instance. v For further information, see: – For UNIX systems: “Enabling the DB2 instance on UNIX systems” on page 157 – For Windows systems: “Enabling the DB2 instance on Windows systems” on page 158 – “The idminstfunc command” on page 135 To configure the databases as a user with SYSADM or DBADM authority: 1. If you do not have an existing database, create a database by using the command DB2 CREATE DATABASE <DBNAME>. 2. Increase the database transaction log size LOGFILSIZ. A recommended value is 2000. Syntax db2 update db cfg for <database name> using logfilsiz 2000 Chapter 4. Getting started 21
  37. 37. 3. Increase the database parameter APP_CTL_HEAP_SZ. A recommended value is 10000. Syntax db2 update db cfg for <database name> using APP_CTL_HEAP_SZ 10000 4. Increase the database parameter APPLHEAPSZ. A recommended value is 1000. Syntax db2 update db cfg for <database name> using APPLHEAPSZ 1000 Creating database objects The UDTs, UDFs, and UDMs provided with IM Scoring must be created in the databases that you want to use with IM Scoring. To do this, call the idmenabledb command, which is available in the bin directory of your IM Scoring installation. A mandatory parameter to the command is the database name. Some optional parameters are available. If you want to execute the sample applications provided with IM Scoring, call the command by means of the fenced and the tables options. Syntax idmenabledb <database name> fenced tables For more information and a detailed description of idmenabledb, see: v “Enabling databases” on page 41 v “The idmenabledb command” on page 133 Verifying the installation and configuration You can quickly verify your installation and configuration, and make sure that the appropriate database objects have been created. To do so, follow these steps: 1. Call the command idmcheckdb <database name>, which is available in the bin directory of your installation. The command returns the enablement status of the database. 2. Connect to a database that you have enabled. 3. Use the following command: db2 "values( IDMMX.DM_applData(’Test’,4))" 4. The command must return without error. If you get any error messages, check your installation and configuration for completeness. Executing sample applications IM Scoring provides a set of samples to help you to get familiar with the UDFs and UDTs. For descriptions of the samples and instructions on how to use them as practice exercises, see: 22 Administration and Programming for DB2
  38. 38. v “Sample components” v “Completing the practice exercises” on page 25 Generating SQL scripts from your own mining models If you already have PMML models available as flat files, you can generate SQL scripts from them by using the idmmkSQL tool. These scripts will contain template SQL statements that import and apply the model. The SQL script contains placeholders that you replace with the names of concrete database objects in order to finally get the executable SQL script. For more information, see: v “Generating SQL statements from models” on page 44 v “The idmmkSQL command” on page 136 You can find a practice exercise in using the idmmkSQL tool at “Using idmmkSQL to work with your own mining models” on page 38. Sample components The IM Scoring package includes sample components consisting of a series of practice exercises in using IM Scoring. This tutorial material enables you to: v Use the Clustering mining function of IM Scoring For an introduction to the Clustering mining function, see Chapter 3, “Data mining functions” on page 17. v Score records using IM Scoring Java Beans For an introduction to IM Scoring Java Beans, see “Online scoring with IM Scoring Java Beans” on page 12. The IM Scoring sample components reside in a samples directory. This directory contains the mining model, data, and scripts that you require to complete the exercises in this chapter. v On the AIX platform, the samples directory is: /usr/lpp/IMinerX/samples/ScoringDB2 v On the Linux and Sun Solaris platforms, the samples directory is: /opt/IMinerX/samples/ScoringDB2 v On the Windows platform, the samples directory is: <install path>samplesScoringDB2 where <install path> is the directory where IM Scoring is installed. You can also use the shortcut IBM DB2 Intelligent Miner Scoring 8.1 —> Scoring - Samples in the program folder. The IM Scoring Java Beans examples are available in samples/ScoringBean. Chapter 4. Getting started 23
  39. 39. Table 4 and Table 5 on page 25 list the files that are included in the samples directory and explain the purpose of each. Table 4. Sample components for the Clustering mining function of IM Scoring Sample component Description clusDemoBanking.dat An exported Demographic Clustering model. The model was built from data for a bank’s customers who have a particular type of account. Customers are grouped according to similarities of age, income, number of siblings, gender, and account type. bankingScoring.data A flat file containing records relating to the customers of a bank. This is the data to which you will apply the model. bankingImport.db2 A script that creates the DB2 table BANKING_SCORING, imports the file bankingScoring.data, and inserts the data into the new table. bankingInsert.db2 A script that imports the model, which is stored in the file clusDemoBanking.dat, and then inserts the model into the table IDMMX.ClusterModels. bankingApplyTable1.db2 A script that: 1. Creates a results table 2. Applies the imported Clustering model to the specified data from the table banking 3. Stores the calculated results in the table 4. Obtains results values from the table bankingApplyTable2.db2 A script that uses nested calls to DM_applData instead of calling the REC2XML function for the purpose of applying the imported Clustering model to the specified data from the table banking. The script: 1. Creates a results table 2. Applies the imported Clustering model to the specified data from the table banking 3. Stores the calculated results in the table 4. Obtains results values from the table bankingApplyView.db2 A script that applies the imported Clustering model to the specified data from the table BANKING_SCORING. The script then obtains values from the calculated results using a common table expression. 24 Administration and Programming for DB2
  40. 40. Table 5. Sample components for IM Scoring Java Beans Sample component Description 93er_cars.pmml A Polynomial Regression model Sample93erCars.java A sample Java program readme.txt A README file Note: The script bankingInsert.db2 uses the table IDMMX.ClusterModels. This is one of the sample tables delivered with IM Scoring. Before you perform the tasks described in this chapter, ensure that you have enabled the database by means of the tables option. For instructions on installing the sample tables, see “The idmenabledb command” on page 133. Completing the practice exercises Before you can complete the practice exercises, you must install IM Scoring and configure the system environment. For guidance on the procedure of installing and configuring IM Scoring, see “Quick start” on page 19. The tutorial consists of the following tasks: v Creating a DB2 table and importing data into it v Importing a mining model into a DB2 table v Applying the model to data and obtaining results values, without storing the results in a DB2 table v Applying the model to data, storing the results in a DB2 table, and obtaining results values from the table v Using IM Scoring Java Beans to score records Note: Before starting the tasks, you must be connected to a database that is enabled for the use of IM Scoring. To run the scripts, you must have SELECT and INSERT privileges on the IDMMX.ClusterModels table. Go to the directory where the samples are installed. See “Sample components” on page 23 for information on the directories where the sample files are stored. Creating a table and importing data In this exercise, you create a table and import the banking data. You will later apply the mining model to this data. Chapter 4. Getting started 25
  41. 41. First, you must connect to the database. To do this, use the following command: db2 connect to <dbname> To create a table and import the sample data contained in the file bankingScoring.data, run the sample script bankingImport.db2, which is contained in the file bankingScoring.data, by using the following command: db2 -stf bankingImport.db2 Contents of the script bankingImport.db2 CREATE TABLE BANKING_SCORING ( TYPE CHAR(7), GENDER CHAR(6), AGE DOUBLE, PRODUCT CHAR(1), SIBLINGS DOUBLE, INCOME DOUBLE ); IMPORT FROM bankingScoring.data OF DEL INSERT INTO BANKING_SCORING ( TYPE, GENDER, AGE, PRODUCT, SIBLINGS, INCOME ); In the first part of the script, the DB2 table BANKING_SCORING is created, its columns are specified, and data types are defined for each column. In the second part, the flat file bankingScoring.data is imported and inserted into the new table. Data from the flat file populates the columns, which are specified by their names. Importing a mining model In this exercise, you import the sample mining model into the DB2 database and store it in a DB2 table, which has a column configured for mining models. To import the sample mining model, which is stored in the file clusDemoBanking.dat, run the script bankingInsert.db2 by using the following command: db2 -stf bankingInsert.db2 Contents of the script bankingInsert.db2 for AIX 26 Administration and Programming for DB2
  42. 42. insert into IDMMX.ClusterModels values ( ’DemoBanking’, IDMMX.DM_impClusFile (’/usr/lpp/IMinerX/samples/ScoringDB2/clusDemoBanking.dat’)); This script uses the function DM_impClusFile, which is specific to IM Scoring, to import the mining model contained in the file clusDemoBanking.dat. The SQL INSERT command inserts the mining model into a column in the table ClusterModels, and sets the name of the model to DemoBanking. The table IDMMX.ClusterModels is configured for the data type DM_ClusteringModel. Note: On Windows, the absolute path is automatically modified at installation time to be consistent with the chosen install path. Applying a model and getting results values You can use different scripts to apply mining models and obtain the results of applying the mining model. In the following exercises, a Demographic Clustering model is used. Using the script 'bankingApplyView.db2' In this exercise, you apply the Demographic Clustering model to the banking data and get the values of the calculated results. To apply the sample mining model and obtain the results of applying the model, run the script bankingApplyView.db2 by using the following command: db2 -stf bankingApplyView.db2 Contents of the script bankingApplyView.db2 WITH clusterView( clusterResult ) AS ( SELECT IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_impApplData( rec2xml( 1.0, ’COLATTVAL’, ’’, B.TYPE, B.AGE, B.SIBLINGS, B.INCOME ) ) ) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’DemoBanking’ ) SELECT IDMMX.DM_getClusterID( clusterResult ), IDMMX.DM_getClusScore( clusterResult ) FROM clusterView ; This script defines a common table expression, clusterView(clusterResult), to hold the results of applying a model. The script then applies the DemoBanking model to selected data from the banking table by using the DM_applyClusModel function. The data values are obtained by means of a call to the DB2 function REC2XML. Chapter 4. Getting started 27
  43. 43. Note: The column names specified in the call to REC2XML must exactly match the names of fields that are used in the mining model. For information on how to retrieve the names of the fields in a mining model, see “Querying model field names” on page 49. Finally, the script obtains the cluster ID and the Clustering score from CLUSTER_RESULT by means of the functions DM_getClusterID and DM_getClusScore. Using the script 'bankingApplyTable1.db2' In this exercise, you: 1. Apply the Demographic Clustering model to the banking data. 2. Store the calculated results in a DB2 table. 3. Obtain results values for any customer who is older than 50. To apply the sample mining model, store results, and obtain results values, run the script bankingApplyTable1.db2 by using the following command: db2 -stf bankingApplyTable1.db2 Contents of the script bankingApplyTable1.db2 CREATE TABLE BANKING_APPLY ( TYPE CHAR(7), GENDER CHAR(6), AGE DOUBLE, PRODUCT CHAR(1), SIBLINGS DOUBLE, INCOME DOUBLE, CLUSTER_RESULT IDMMX.DM_ClusResult ); INSERT INTO BANKING_APPLY SELECT B.TYPE, B.GENDER, B.AGE, B.PRODUCT, B.SIBLINGS, B.INCOME, IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_impApplData( rec2xml(1.0, ’COLATTVAL’,’’, B.TYPE, B.AGE, B.SIBLINGS, B.INCOME))) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’DemoBanking’; SELECT AGE, IDMMX.DM_getClusterID( CLUSTER_RESULT ), IDMMX.DM_getClusScore( CLUSTER_RESULT ) FROM BANKING_APPLY WHERE AGE > 50; DROP TABLE BANKING_APPLY; This script creates a DB2 table for the mining results by defining the names and the data types of the columns. The last column, CLUSTER_RESULT, is designated for the results that are calculated. The column is configured for the 28 Administration and Programming for DB2
  44. 44. data type DM_ClusResult. The script then applies the DemoBanking model to selected data from the banking table by using the DM_applyClusModel function. Finally, it obtains the cluster ID and the Clustering score from the CLUSTER_RESULT column of the new table by using the functions DM_getClusterID and DM_getClusScore. You can also apply models and compute cluster IDs in a single SQL query. The following example shows an SQL query of this kind: select b.type, b.gender, b.age, b.product, b.siblings, b.income, IDMMX.DM_getClusterID( IDMMX.DM_applyClusModel( c.model, IDMMX.DM_impApplData( REC2XML( 1, ’COLLATVAL’, ’’, b.type, b.age, b.siblings, b.income ) ) ) ) from banking b, IDMMX.ClusterModels c where c.modelname = ’DemoBanking’; Tip: You can use the application functions to define SQL VIEWS that are similar to the output tables created by IM for Data Version 6. The SQL statement would look similar to the template in the following example: CREATE VIEW ApplyOut ( ID, NAME, AGE, ClusterID )AS SELECT I.ID, I.NAME, I.AGE, IDMMX.DM_getClusterID(IDMMX.DM_applyClusModel(c.model, IDMMX.DM_impApplData( REC2XML( 1, ’COLLATVAL’, ’’, ... ) ) ) ) FROM InputTable I, IDMMX.ClusterModels C WHERE C.modelName = ..... Afterwards, you can access the SQL VIEW by using any SELECT statement, such as the following: SELECT ID, NAME, AGE, ClusterID FROM ApplyOut WHERE ClusterID = 3 Using the script 'bankingApplyTable2.db2' The script bankingApplyTable2.db2 has the same functionality as the script bankingApplyTable1.db2, but it uses nested calls to DM_applData instead of a call to REC2XML. For information on the advantages and possible inconveniences of using DM_applData, see “Specifying data by means of DM_applData” on page 52. Alternatively, you can use a CONCAT expression. For further information about this possibility, see “Specifying data by means of CONCAT” on page 53. To apply the sample mining model, store results, and obtain results values, run the script bankingApplyTable2.db2 by using the following command: Chapter 4. Getting started 29
  45. 45. db2 -stf bankingApplyTable2.db2 Contents of the script bankingApplyTable2.db2 CREATE TABLE BANKING_APPLY ( TYPE CHAR(7), GENDER CHAR(6), AGE DOUBLE, PRODUCT CHAR(1), SIBLINGS DOUBLE, INCOME DOUBLE, CLUSTER_RESULT IDMMX.DM_ClusResult ); INSERT INTO BANKING_APPLY SELECT B.TYPE, B.GENDER, B.AGE, B.PRODUCT, B.SIBLINGS, B.INCOME, IDMMX.DM_applyClusModel( c.model , IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( ’TYPE’, b.type ), ’AGE’, b.age), ’SIBLINGS’, b.siblings ), ’INCOME’, b.income )) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’DemoBanking’; SELECT AGE, IDMMX.DM_getClusterID( CLUSTER_RESULT ), IDMMX.DM_getClusScore( CLUSTER_RESULT ) FROM BANKING_APPLY WHERE AGE > 50; DROP TABLE BANKING_APPLY; This script creates a DB2 table for the mining results by defining the names and the data types of the columns. The last column, CLUSTER_RESULT, is designated for the calculated results. It is configured for the data type DM_ClusResult. The script then applies the DemoBanking model to selected data from the banking table by using the function DM_applyClusModel. Finally, the script obtains the cluster ID and the Clustering score from the CLUSTER_RESULT column of the new table. To do this, it uses the functions DM_getClusterID and DM_getClusScore. You can also apply models and compute cluster IDs in a single SQL query. The following example shows an SQL query of this kind: select b.type, b.age, b.product, b.siblings, b.income, IDMMX.DM_getClusterID( IDMMX.DM_applyClusModel(c.model , IDMMX.DM_applData( IDMMX.DM_applData( 30 Administration and Programming for DB2
  46. 46. IDMMX.DM_applData( IDMMX.DM_applData( ’TYPE’, b.type ), ’AGE’, b.age), ’PRODUCT’, b.product), ’SIBLINGS’, b.siblings ), ’INCOME’, b.income )) from banking b,IDMMX.ClusterModels c where c.modelname=’DemoBanking’; Extracting information from a model In this exercise, you extract information from a model. The model from which you extract information is the one that you inserted into the database as part of the exercise in the section “Importing a mining model” on page 26. The information that you extract is: v The name of the model v The number of clusters v The names of the mining fields To extract the information, run the script bankingExtract.db2 by using the following command: db2 -tf bankingExtract.db2 Contents of the script bankingExtract.db2 WITH MODELCONTENT( CLUSMODELNAME, NOCLUSTERS, MODELFIELDS ) AS ( SELECT IDMMX.DM_getClusMdlName( MODEL ), IDMMX.DM_getNumClusters( MODEL ), IDMMX.DM_getClusMdlSpec( MODEL) FROM IDMMX.ClusterModels WHERE MODELNAME=’DemoBanking’ ) SELECT CLUSMODELNAME, NOCLUSTERS, MODELFIELDS..DM_getFldName(1) AS FIELDNAME1, MODELFIELDS..DM_getFldName(2) AS FIELDNAME2, MODELFIELDS..DM_getFldName(3) AS FIELDNAME3, MODELFIELDS..DM_getFldName(4) AS FIELDNAME4 FROM MODELCONTENT; Applying models created with IM Modeling In this exercise, you apply models created with IM Modeling. A prerequisite for executing these samples is that you have installed and configured IM Modeling and executed the banking samples provided with IM Modeling. Executing the IM Modeling samples before executing the sample Chapter 4. Getting started 31
  47. 47. files provided with IM Scoring has a further advantage. It helps you to understand which UDFs and UDMs belong to IM Modeling, which belong to IM Scoring, and which belong to both. To apply the models, run the scripts bankingApplyModeling1.db2 and bankingApplyModeling2.db2 by using the following commands: db2 -tf bankingApplyModeling1.db2 db2 -tf bankingApplyModeling2.db2 In the first set of statements in the scripts, information is extracted from the model. The second set of statements in the scripts applies the models to new data. The difference between the two sets of statements is that the first one uses rec2xml to build the record and the second uses DM_applData. Contents of the script bankingApplyModeling1.db2 WITH MODELCONTENT( CLUSMODELNAME, NOCLUSTERS, MODELFIELDS ) AS ( SELECT IDMMX.DM_getClusMdlName( MODEL ), IDMMX.DM_getNumClusters( MODEL ), IDMMX.DM_getClusMdlSpec( MODEL) FROM IDMMX.ClusterModels WHERE MODELNAME=’BankingClusColumnModel’ ) SELECT CLUSMODELNAME, NOCLUSTERS, MODELFIELDS..DM_getFldName(1) AS FIELDNAME1, MODELFIELDS..DM_getFldName(2) AS FIELDNAME2, MODELFIELDS..DM_getFldName(3) AS FIELDNAME3, MODELFIELDS..DM_getFldName(4) AS FIELDNAME4 FROM MODELCONTENT; WITH clusterView( clusterResult ) AS ( SELECT IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_impApplData( rec2xml( 1, ’COLATTVAL’, ’’, B.TYPE, B.AGE, B.SIBLINGS, B.INCOME ) ) ) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’BankingClusColumnModel’ ) SELECT IDMMX.DM_getClusterID( clusterResult ), IDMMX.DM_getClusScore( clusterResult ) FROM clusterView ; Contents of the script bankingApplyModeling2.db2 WITH MODELCONTENT( CLUSMODELNAME, NOCLUSTERS, MODELFIELDS ) AS ( SELECT IDMMX.DM_getClusMdlName( MODEL ), 32 Administration and Programming for DB2
  48. 48. IDMMX.DM_getNumClusters( MODEL ), IDMMX.DM_getClusMdlSpec( MODEL) FROM IDMMX.ClusterModels WHERE MODELNAME=’BankingClusAliasModel’ ) SELECT CLUSMODELNAME, NOCLUSTERS, MODELFIELDS..DM_getFldName(1) AS FIELDNAME1, MODELFIELDS..DM_getFldName(2) AS FIELDNAME2, MODELFIELDS..DM_getFldName(3) AS FIELDNAME3, MODELFIELDS..DM_getFldName(4) AS FIELDNAME4 FROM MODELCONTENT; WITH clusterView( clusterResult ) AS ( SELECT IDMMX.DM_applyClusModel( C.MODEL , IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( IDMMX.DM_applData( ’N_TYPE’, B.TYPE ), ’N_AGE’, B.AGE), ’N_SIB’, B.SIBLINGS ), ’N_INC’, B.INCOME )) FROM BANKING_SCORING B, IDMMX.ClusterModels C WHERE C.MODELNAME=’BankingClusAliasModel’ ) SELECT IDMMX.DM_getClusterID( clusterResult ), IDMMX.DM_getClusScore( clusterResult ) FROM clusterView ; Using IM Scoring Java Beans to score records In this exercise, you use IM Scoring Java Beans to score records. To do this, you use the sample program Sample93erCars.java, which you can find in the DB2 IM Scoring installation directory under samples/ScoringBean. In the example, the minimum price of a car is predicted, given the basic characteristics for a car. The data used to train and generate the mining model contained a large number of fields, including: Horsepower Engine Size City MPG Highway MPG Passenger capacity Weight (pounds) The training data also contained the actual Minimum Price (in $1000), which was used as the predicted field when the mining model was generated. When IM Scoring Java Beans is used with this model, the scorer predicts the minimum car price for new, previously unseen data. Chapter 4. Getting started 33

×