Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. SELECTING CLASSIFICATION AND CLUSTERING TOOLS FOR ACADEMIC SUPPORT Manying Qiu, Virginia State University, ABSTRACT the customer to do business with them rather than with a competitor. Companies are learning to look Classification and clustering are powerful and at the lifetime value of each customer. popular data mining techniques. Organizations use Organizations want to know which customers are them to capture information, retain customers, and worth investing money and effort to hold on to and improve business performance. This paper which ones may be potentials for loss. Also, presents a method for selecting data mining companies could use knowledge about customers software for an academic environment based on its to give each customer individualized attention. For classification and clustering tools. This research example, a company could allow a customer to applies the data mining software evaluation create his or her own personalized home page at the framework to evaluate three major commercial company’s Website. data mining software: SAS Enterprise Miner, Clementine from SPSS, and IBM DB2 Intelligent To learn about customers a company must gather Miner. We added to the framework a criterion that data from various sources and organize it in a became important in the Internet age. After consistent and useful way. Usually a data ranking software on relevant criteria in the warehouse stores the tremendous amount of data framework then purchase the best one that is needed for doing business. This data must be affordable for academic support. analyzed, understood, and turned into actionable information. Data mining combs through the Keywords: Data mining, classification, clustering, records to discover patterns, devise rules, come up with new ideas to do business, and make software evaluation predictions about the future [2]. Data mining employs one or more computer learning techniques INTRODUCTION to automatically analyze and extract knowledge from data contained within a database [6]. Data mining techniques have been successfully applied in many different fields including One of the most powerful and popular machine marketing, manufacturing, process control, fraud learning methods is classification, which constructs detection, and network management [1]. Data a decision tree from examples. A decision tree mining is supported by a variety of software (consisting of nodes and branches) represents a employing basically the same tools. Because collection of rules, with each terminal node (i.e. implementations are different it could be difficult leaf) corresponding to a specific decision rule. to select appropriate software for a particular Decision trees are usually constructed beginning environment. This paper focuses on how to select with the root of the tree at the top and proceeding appropriate data mining software for an academic down to its leaves at the bottom. This technique environment based on the implementations of may be used directly for predictive or descriptive classification and clustering tools. First we explain purposes, and there are a variety of algorithms for why these tools are useful and how they work. building decision trees. Then we use a software evaluation framework to evaluate three major commercial data mining Another popular technique is clustering. Clustering software: SAS Enterprise Miner, Clementine from is to divide a heterogeneous population into a SPSS, and IBM DB2 Intelligent Miner. SAS and number of more homogeneous segments or clusters. SPSS statistical packages are commonly used for Business frequently use clustering to carry out academic support, while IBM DB2 is widely used market segmentation study, group customers into in industries. clusters with similar buying behaviors and then develop marketing strategies to promote sales best A typical motivation for data mining is to develop for each cluster. Clustering is undirected an in-depth knowledge of customers and prospects knowledge discovery – no target variable is defined. that is essential for businesses to stay competitive Clustering techniques assign groups of records to in today’s marketplace. Companies in every the same cluster if they have something in common industry are trying to move towards the one-to-one with the hope that this will make data mining ideal of understanding each individual customer techniques easier to discover meaningful patterns and to use that understanding to make it easier for from the dataset. Clustering is often served as Volume VIII, No. 2, 2007 265 Issues in Information Systems
  2. 2. Selecting classification and clustering tools for academic support prelude to some other technique of data mining or 4. Ancillary Task Support -- allows the user to modeling. perform data cleansing, manipulation, Data mining continues to grow in importance and transformation, visualization and other tasks that corporations need to use the best products available support data mining. for their tasks. There is no single classification tool Criteria: (1) Data filtering; (2) Deriving and/or clustering tool that satisfies all user purposes. attributes Instead, we must consider tools with respect to the particular environment and analysis needs. Collier We added the following criteria to the framework et al [3] provide a framework for evaluating data to include organizational and management issues. mining tools. The framework categorizes criteria 5. Text Miner Availability – Data Mining Tools for evaluating data mining software into four areas: can only access and process structured, basically performance, functionality, usability and ancillary numerical data to perform decision tree induction task support. Let’s assume that a group of and clustering. In the Internet age, there is an university professors are looking for classification increasing volume of unstructured text data which and clustering tools for their academic support: includes document, email, web page, etc. We want teaching, research and consulting activities. We to make sure the selected data mining tool will be added to the framework a criterion “text miner able to categorize, search or personalize textual availability” that became important in the Internet data with its text miner. age. The framework will be applied to evaluate Criterion: Text Miner Availability classification and clustering techniques in three 6. Cost – cost of licenses for academic server and major commercial data mining software, namely workstations SAS Enterprise Miner, Clementine from SPSS and Criterion: academic license price IBM DB2 Intelligent Miner. Although there are many data mining software which include ROI is extremely important to organizations when classification and clustering techniques, these three purchasing software. Because the “return” is representative software are selected to investigate usually measured in dollar value, we did not how they cover components of classification and include it in the framework for an academic clustering process and whether they differ in environment where profit is generally not the goal. criteria such as software architecture, data access, However a similar measure of payoff could be algorithm variety, prescribed methodology, designed for an academic environment. One could visualization, user types and so on. The evaluation measure value to academic constituents (e.g. process should provide an example to help students) by how prevalent is each tool in industry organizations compare characteristics of (market share). This prevalence indicates the classification and clustering tools to select data likelihood students can use their skill without mining software that meets their needs. having to learn a different tool when they take an internship or get a job. Professors may want to DATA MINING SOFTWARE EVALUATION consider this ROI-like perspective when deciding on a purchase. FRAMEWORK DESCRIPTION OF DM TOOLS The data mining software evaluation framework [3] is applied to evaluate classification and clustering We shall discuss each tool with respect to the tools in different data mining software. The framework criteria and estimate the cost for an framework employs the following categories, each academic environment. having criteria that facilitate evaluation in the category and are relevant to academic teaching, SAS Enterprise Miner research and consulting activities. SAS Institute’s software started off in the mid 1. Performance – The ability to handle a variety 1970s as a 4GL based statistical package for of data sources in an efficient manner. financial and economic analysis. Over the years Criteria: (1) Software Architecture; (2) the SAS system has expanded to become a multi- Heterogeneous Data Access faceted product providing organizations with a 2. Functionality – the inclusion of a variety of complete information delivery system. Enterprise capabilities, techniques, and methodologies for data Miner addresses the entire data mining process— mining. all through an intuitive point-and-click graphical Criteria: (1) Algorithmic Variety; (2) user interface. Combined with SAS data Prescribed Methodology warehousing and on-line analytic processing 3. Usability – accommodation of different levels (OLAP) technologies, it creates a synergistic, end- and types of users without loss of functionality or to-end solution that addresses the full spectrum of usefulness. knowledge discovery [7]. Criteria: (1) User types; (2) Data visualization Volume VIII, No. 2, 2007 266 Issues in Information Systems
  3. 3. Selecting classification and clustering tools for academic support Software Architecture Enterprise Miner sits on • Sample the data by extracting a portion of a top of a large, bundled collection of SAS statistical data set. products. Enterprise Miner is available in a • Explore the data by searching for unanticipated standalone, workstation configuration or in a trends and anomalies. client/server configuration. In the latter case, you • Modify the data by creating, selecting, and can perform analysis on both the workstation and transforming the variables to focus the model the server simultaneously [9]. Enterprise Miner is selection process. a distributed client/server based system which • Model the data by formalizing the patterns, integrates fully with the rest of SAS software, from which predictions can be made. including OLAP and a tool allowing applications to • Asses the data by evaluating the usefulness and be deployed via the Internet. The client/server reliability of the findings from the decision tree enablement distributes data intensive processing to induction process and estimate how well it the most appropriate machine, as well as performs. maintaining a central data source, and allowing Enterprise Miner user builds a process flow access to diverse data sources from DBMS located diagram in the workspace by choosing nodes from on different servers [7]. SAS software the tool bar and menus and using drag and drop to also supports management by an administrator. link them together. These objects allow users to choose from a wide range of node types, which Heterogeneous Data Access SAS software has represent the operations that users perform to effect built its reputation on the ability to access, manage specific data mining steps, such as sampling, data and analyze data from any source, and provides a extraction and so on. It supports both automatic range of drivers that allow full access to data stores. and interactive training; in other words, the Enterprise Miner is able to access data in software can build the entire tree automatically, or warehouse/data mart and over 50 different file build the tree node by node with user input; users structures. Supported data sources include all the use their business intuition to specify the variables major relational databases as well as non-SQL data used in subsequent tree development from a list sources, PC sources and ODBC compliant provided by the software. databases. SAS software has an MDDB server which provides a platform independent specialized User Types Enterprise Miner is designed for a OLAP storage facility for rapid access by end users combination of beginning, intermediate, and through various tools [7] [8]. advanced users. A rich set of analysis and modeling capabilities is provided for business users Algorithmic Variety Enterprise Miner supports and professional users with some necessary various models and algorithms for classification. statistical insight and business knowledge. With Enterprise Miner, you can access an integrated suite of advanced models and algorithms, Data Visualization Enterprise Miner supports including clustering, decision trees, linear and visual analysis and reporting. In addition to graphs logistic regression to achieve analytical depth. showing outputs from the data mining techniques, Enterprise Miner supports classification and there are novel plots which show business users the regression trees, CHAID and C4.5 algorithms for financial benefits and risks, associated with a building decision tress, enhanced methods to particular model. Each node has its own list of evaluate a tree based on profit or lift objectives and properties, parameters and values, which the user prune accordingly, interactive growing/pruning of can view by clicking on the object, and edit if trees; new C-based algorithms and interactive thin desired. Data and statistics associated with a client VC++ Windows tree results viewer [8]. decision tree node can be examined by clicking on Enterprise Miner supports clustering and that object. Graphics include 3D rotating charts segmentation of databases, using k-means and histograms are used to view the desired data. clustering, self-organizing maps and Kohonen Users can watch a model being built in real time networks techniques [8]. The results of clustering through a window, and can stop the operation any can be passed to other nodes such as Decision Tree time they wish. Node for explanation of the clusters formed. It can also be passed as a group variable that enables the Data Filtering Data filtering is handled by the user to automatically construct separate models for Filter Outliers node. It enables the user to identify each cluster. and remove outliers from data sets. Users can eliminate rare values in class variables and/or Prescribed Methodology Enterprise Miner extreme values in interval variables. provides a guiding, yet flexible, framework for conducting decision tree induction encompassing Deriving Attributes User can create new variables five primary steps (SEMMA)-- from existing ones in the Transform Variables node. Enterprise Miner provides default functions, such Volume VIII, No. 2, 2007 267 Issues in Information Systems
  4. 4. Selecting classification and clustering tools for academic support as squares, inverse, exponential, standardized, means techniques along with a new TwoStep square root and logarithms, but allows users to technique. TwoStep is a hierarchical clustering input their own computational formulae, and build algorithm that begins by automatically generating a expressions. set of low-level subclusters, then recursively merging them into larger, more generalized clusters Text Miner Availability Text mining includes until the process can no longer be performed clustering algorithms, document categorization and without sacrificing the internal cohesiveness of the data extraction. higher-level cluster [5]. Cost An academic server license is from $40,000 Prescribed Methodology SPSS (and ISL before it) to $100,000, and a mainframe license is from has espoused the use of a formal methodology for $47,000 to $222,000. Data Mining for the data mining and it has been a member of the Classroom costs $8,000 for 100 workstations. CRISP-DM (Cross Industry Standard Process for Data Mining) group since its foundation in 1996. Clementine from SPSS This methodology defines a six stage process for Clementine was originally developed by Integrated data mining projects [4]: Solutions Ltd (ISL), which was formed in 1989 by • Business understanding – A number of Dr. Alan Montgome and five colleges. Clementine Clementine Application Templates (CATs) are was one of the very first products to bring machine available as add-on modules that encapsulate, learning to business intelligence, while providing a at least some extent, business understanding user interface that was intelligible to non-experts within specific sectors. Some CATs available [4]. SPSS acquired ISL in 1998. SPSS itself was are: crime, fraud, microarray, CRM, Web founded in 1968 and earned its reputation primarily mining and Telco. as a provider of statistical software, plus graphical • Data Understanding – There are two aspects to applications to represent those statistics [4]. data understanding: the extraction of data and the examination of that data for its utility in the Software Architecture Historically, Clementine proposed data mining operation. was a client-only product in which all processing • Data Preparation – Data preparation (and took place on the client itself. While external retrieval) is performed through ‘data databases could be accessed, they were treated by manipulation’ operations. the software as local files; this approach necessarily • Modeling – Modeling (and visualization) is at put a heavy burden on the client platform. In 1999, the heart of Clementine. Clementine is a fully SPSS introduced a server-based version of the graphical end user tool based on a simple product, with middleware running on a middle-tier paradigm of selecting and connecting icons server taking the load off the client and using the from a palette to form what SPSS calls a superior performance of back-end databases to ‘stream’ to analyze the data under support in situ data mining [4]. consideration. • Evaluation – Evaluation is about visualizing Heterogeneous Data Access Because Clementine the results of the data mining process to is an SPSS package, importing SPSS data is understand the data. straightforward, as is data in SAS or CSB format. • Deployment – Output data, together with the Clementine can read a variety of different file types relevant predictions, can be written to files or including data stored in spreadsheets and databases. exported to ODBC compliant databases as new Data can be read from ASCII files in either free- tables or appended to existing ones. field or fixed-field format. Data can be imported from a variety of other packages including Excel, User Types From the outset, the aim of ISL was to MS Access, dBase, FoxPro and Paradox using the build an integrated environment for data mining ODBC [10]. operations that could be used and understood by business people without the need to rely on Algorithmic Variety Rule induction – SPSS technical experts [4]. supports C5.0 and C&RT decision tree algorithms. SPSS has introduced a graphical decision tree Data Visualization A list of generic visualization browser so that the user can select the most capabilities includes tables, distribution display, intuitive way to view decision trees. Rule sets can plots and multiplots, histograms, webs in which also be produced from C5.0 while C&RT can be different line thicknesses show the strength of a adopted to predict numeric outputs. connection. Some visualization techniques are Regression modeling - It uses linear regression to available in 3D as well as 2D and the user can generate an equation and a plot. Logistic overlay multiple attributes onto a single chart in regression is also available [4]. For clustering, order to try to get more information. For example, Clementine offers the traditional Kohonen and K- the user might have a 3D scatter diagram and then Volume VIII, No. 2, 2007 268 Issues in Information Systems
  5. 5. Selecting classification and clustering tools for academic support add color, shape, transparency or animation options provided to import data into DB2 Universal to make patterns more easily discernible. Database from Oracle, Sybase, or DB2 for OS/390 databases [11] [13]. Data Filtering The Filter node is used to rename fields or remove unwanted fields from analysis, for Prescribed Methodology IBM provides minimum example those having invariant values or those user guidance. There is no single set way of using having a high proportion of missing information. intelligent Miner. Different users utilize the tool set differently, using the operations alone or in Deriving Attributes The Derive node allows combination, that best meet the needs of the modifying data values or creating new fields as business. However, a custom interface to select functions of others. Clementine Language for pre-defined subsets of the function is designed for Expression Manipulation (CLEM) allows the user business analysts. to derive new values Algorithmic Variety IBM offers two techniques Tax Miner Availability Text Mining for for decision tree induction: Clementine uses the LexiQuest solution's linguistic • Classification explores synergies between two extraction technology to access and process different types of entity (e.g. between products unstructured data. The text mining for Clementine and occupation of purchasers). It enables user supports extraction of concepts from text data to profile customers based on a desired stored in a database, determine frequencies and outcome, such as propensity to buy high-end categories, and link results to structured data and clothing. Tree induction creates models that display the document or document selected through are represented either as decision trees, or as IF text mining. THEN rules. Cost A teaching license for 100 users costs $4,000 • Prediction explores changes in established per year. patterns (often used in the identification of opportunities for cross selling). Linear regression is used for value prediction. IBM DB2 Intelligent Miner Demographic IBM DB2 Intelligent Miner Clustering automatically determines the number of IBM proposed many of the principles that now clusters based on the measure representing how constitute mainstream data warehouse technology. similar the records within the individual clusters It introduced the term Information Warehouse in should be. 1991, and developed data mining software in various forms over the years [11]. IBM believes User Types Users include expert mining analysts that doing business in the New Economy requires and business analysts. IBM has provided a set of personalization and timing—getting the right technologies that allows developers who have an messages to the right people at the most opportune understanding of the issues facing decision tree time. The business analysis results must be induction and clustering to build their own applicable in real-time. With Intelligent Miner, the solutions, based on the algorithms and functions user can have an end-to-end solution in which the that IBM provides. Typically new users are results of data mining are driven back into assisted by IBM consultants. IBM expects this operational applications [12]. software to appeal particularly to independent software vendors and large customers [11]. Software Architecture IBM sees data mining as a key component of an information warehouse Data Visualization The main client window is an framework. It therefore normally implements data Explorer-like view, showing a directory of the data mining in an organization as part of a data available by type. After the data has been mined, warehousing architecture. The main processing IBM provides a range of viewing options including component, the Mining Kernel, is a client/server charts and graphs, histograms and tree diagrams. system [11]. Intelligent Miner grows its decision trees interactively with the user, asking the ‘best Heterogeneous Data Access Intelligent Miner is question’ at each node. not specific to IBM data warehouses or databases. It can process information stored in DB2 or flat Data Filtering Data derived from some sources, files on AS/400, AIX or MVS. If the data to be such as operational systems, may not be of suitable mined has been downloaded to a flat file, the data quality for classification and clustering. mining processor is capable of accessing this file Sometimes data needs to be pre-treated. directly. Decision tree induction operations can Additionally, some databases may hold information also be performed directly on Oracle or Terradata which is not of a type which is useful for decision databases. In addition, a high-speed extract is tree induction and clustering. IBM has developed Volume VIII, No. 2, 2007 269 Issues in Information Systems
  6. 6. Selecting classification and clustering tools for academic support techniques that allow data to be filtered and each category are assigned weights with respect to reduced, for example, filtering variables out of target needs and the total of these weights within input records, filtering records out of the input each category equals 1.00 or 100%. Each category database and filtering records using a value set. is also assigned a weight with the same respect and the total of these weights equals 1.00 or 100%. Deriving Attributes One may derive new Next the tools can be scored for comparison. We variables from original input data using SQL select SAS Enterprise Miner as reference software expressions. For example one may derive because the professors are using other components aggregate values using SQL column functions such of the SAS statistical package for their teaching, as AVG, SUM, and COUNT. One may calculate research and consulting activities. Enterprise new values using SQL expressions. Miner receives a score of 3 for each criterion, and other two types of software are rated against Tax Miner Availability IBM Intelligent Miner for Enterprise Miner for each criterion using the text includes a set of tools that enrich business following discrete rating scale: intelligence solutions, including text analysis, full text search, Web crawler, and Web search. Relative Performance Rating Much worse than Enterprise Miner 1 Cost Academic users can obtain this software at no Worse than Enterprise Miner 2 cost through the IBM Scholars program. Same as Enterprise Miner 3 Better than Enterprise Miner 4 COMPARATIVE ANALYSIS Much better than Enterprise Miner 5 The comparative analysis of software will focus on Classification and Clustering Tools Evaluation the relevant evaluation criteria. The criteria within Scoring by University Professors Criteria Weight Enterprise Miner Clementine Intelligent Miner (reference) Performance (0.20) Rating Score Rating Score Rating Score Software Architecture 0.5 3 1.5 3 1.5 2 1.0 Heterogeneous data access 0.5 3 1.5 2 1.0 2 1.0 Category Score 3.0 2.5 2.0 Functionality (0.20) Algorithmic Variety 0.5 3 1.5 3 1.5 2 1.0 Prescribed Methodology 0.5 3 1.5 4 2.0 2 1.0 Category Score 3.0 3.5 2.0 Usability (0.20) User Types 0.7 3 2.1 2 1.4 1 0.7 Data Visualization 0.3 3 0.9 4 1.2 3 0.9 Category Score 3.0 2.6 1.6 Ancillary Task Support (0.15) Data Filtering 0.6 3 1.8 3 1.8 3 1.8 Deriving Attributes 0.4 3 1.2 2 0.8 2 0.8 Category Score 3.0 2.6 2.6 Text Miner Availability (0.25) 1.0 3 3.0 3 3.0 3 3.0 Category Score 3.0 3.0 3.0 Weighted Average 3.0 2.9 2.3 Software Architecture IBM Intelligent Miner gets client-server architecture. Although almost all the 2 for software architecture because it only uses organizations have implemented computer Volume VIII, No. 2, 2007 270 Issues in Information Systems
  7. 7. Selecting classification and clustering tools for academic support networking, a stand-alone software architecture CONCLUSION certainly benefits students or professors who want to do analysis on their own stand alone computers. According to weighted average scores SAS Like SAS Enterprise Miner, Clementine allows Enterprise Miner is the best (3.0) closely followed users a choice between standard-alone and client- by Clementine (2.9) and Intelligent Miner (2.3). server versions. Therefore, Clementine gets 3. One could use this ranking to select the best software whose cost is within the budget. IBM Heterogeneous Data Access Because Intelligent Miner is the cheapest, followed by Clementine’s access to files other than SPSS files Clementine and SAS Enterprise Miner. is not straightforward Clementine gets 2. Intelligent Miner’s access to data sources other The data mining framework is a valuable than DB2 is not direct so it also gets 2 for this instrument to evaluate data mining software based criterion. on their classification and clustering tools and the framework could be tailored to meet specific Algorithmic Variety Like Enterprise Miner mining needs. For example, in the academic and Clementine supports multiple models and environments integration of numerical and textual algorithms for classification and clustering, so data mining becomes increasingly important at the Clementine scores a 3. IM does not support as Internet age. Also one must carefully interpret the many algorithms as EM so it gets a 2. criteria so that tools can be evaluated properly against these criteria. The evaluation results in this Prescribed Methodology Enterprise Miner’s paper do not necessarily apply to different SEMMA (sample, explore, modify, model and environments. Other environments may require assess) provides a methodology that clarifies the different criteria in the framework. For example, data mining process. Clementine adopts CRISP- some issues such as software security may not be DM (Cross Industry Standard Process for Data so important in an academic environment, while in Mining) method that has more components than a business software security is very important. SEMMA, so Clementine gets 4 in this criterion. Defining the user environment is a very important Intelligent Miner’s method has fewer components step before scoring the tools. so it scores 2 for this criterion. REFERENCES User Types Enterprise Miner can be utilized by a wide range of users having different skill levels. 1. Barbara, D & Jajodia S. Applications of Data Clementine has a restricted focus on business, non- Mining in Computer Security, Boston, MA: technical users, so it gets 2. Intelligent Miner aims Kluwer Academic Publishers, 2002. at technical users so it gets 1 for this criterion. 2. Berry, M.J.A. & Linoff, G.S., Mastering Data Mining, New York, NY: Wiley Computer Data Visualization Clementine is a fully graphical Publishing, 2000. end user tool based on a simple paradigm of 3. Collier, K, Carey, B, Sautter, D. and selecting and connecting icons from a palette to Marjaniemi, C. “A Methodology for form a ‘stream’. A stream may consist of any Evaluating and Selecting data Mining number of different attempts to analyze the data Software,” Proceedings of the 32nd Hawaii under consideration and the ‘model’, so Clementine International Conference on System Sciences, gets 4. Intelligent Miner is as good as Enterprise 1999, IEEE. Miner so it scores 3. 4. Howard, P “Clementine from SPSS.” Bloor Research, June 2003. Data Filtering All three products have similar capabilities and are rated 3. 5. James, G. “Analysts' Darling,” Intelligent Enterprise, Aug 31, 2001, Deriving Attributes Clementine and IM do not have as many as predefined methods available for 413products1_1.jhtml deriving attributes (e.g. statistical functions, 6. Roiger, R.J. and Geatz, M.W. Data Mining: A mathematical functions, Financial functions, etc.) Tutorial-Based Primer, Boston, MA: Anderson so both of them are rated 2. Wesley Publishing, 2003. 7. “SAS Enterprise Miner” (1998) Text Miner Availability Because all three data http://www.bloor- mining software have text miner available we score each a 3. 8. “SAS Enterprise Miner: Unearthing the truth – profitable data mining results with less time and effort”, Volume VIII, No. 2, 2007 271 Issues in Information Systems
  8. 8. Selecting classification and clustering tools for academic support amining.html 9. “Hard Core Mining (SAS Institute’s Enterprise Miner 4.1) (Evaluation)” Intelligent Enterprise, Oct 4, 2001 4(5) p 46, ml 10. “Introduction to Clementine,” SPSS Inc., Training Department e5.pdf 11. “IBM Intelligent Miner,” (1998) http://www.bloor- 12. “IBM DB2 Intelligent Miner,” http://www- 13. “DB2 Intelligent Miner for Data: Technical Detail,” bin/software/webtools/print/ Volume VIII, No. 2, 2007 272 Issues in Information Systems