A data mining tool for large relational databases 
DBMiner, a data mining system for interactive mining of multiple-level knowledge in
large relational databases, has been developed based on our years-of-research. The
system implements a wide spectrum of data mining functions, including
generalization, characterization, discrimination, association, classification, and
prediction. By incorporation of several interesting data mining techniques, including
attribute-oriented induction, progressive deepening for mining multiple-level rules, and
meta-rule guided knowledge mining, the system provides a user-friendly, interactive
data mining environment with good performance.
Project Overview 
A data mining system, DBMiner, has been developed for interactive mining of
multiple-level knowledge in large relational databases. It is based on studies of data
mining techniques and experience in the development of an early system prototype,
DBLearn. The system implements a wide spectrum of data mining functions, including
generalization, characterization, association, classification, and prediction. By
incorporation of several interesting data mining techniques, including attribute-oriented
induction, statistical analysis, progressive deepening for mining multiple-level
knowledge, and meta-rule guided mining, the system provides a user-friendly,
interactive data mining environment with good performance.
Project Description 
Figure: General architecture of DBMiner
The system has the following distinct features:
• It incorporates several interesting data mining techniques,
including attribute-oriented induction, progressive deepening for
mining multiple-level rules and meta-rule guided knowledge
mining, etc., and implements a wide spectrum of data mining
functions including generalization, characterization, association,
classification, and prediction.
• It performs interactive data mining at multiple concept levels on
any user-specified set of data in a database using an SQL-like
Data Mining Query Language, DMQL, or a graphical user
interface. Users may interactively set and adjust various
thresholds, control a data mining process, perform roll-up or
drill-down at multiple concept levels, and generate different
forms of outputs, including generalized relations, generalized
feature tables, multiple forms of generalized rules, visual
presentation of rules, charts, curves, etc.
• Efficient implementation techniques have been explored using
different data structures, including generalized relations and
multiple-dimensional data cubes, and being integrated with
relational database techniques. The data mining process may
utilize user- or expert-defined set-grouping or schema-level
concept hierarchies which can be specified flexibly, adjusted
dynamically based on data distribution, and generated
automatically for numerical attributes.
• Both UNIX and PC (Windows/NT) versions of the system adopt
a client/server architecture. The latter communicates with
various commercial database systems for data mining using the
Major functional modules :
Figure: Knowledge discovery modules of DBMiner
The characterizer generalizes a set of task-relevant data into a generalized relation
which can then be viewed at multiple concept levels from different angles. In
particular, it derives a set of characteristic rules which summarize the general
characteristics of a set of user-specified data (called the target class). For example,
the symptoms of a specific disease can be summarized by a characteristic rule.
A discriminator discovers a set of discriminant rules which summarize the features
that distinguish the class being examined (the target class) from other classes (called
contrasting classes). For example, to distinguish one disease from others, a
discriminant rule summarizes the symptoms that discriminate this disease from
DBMiner association rule finder
An association rule finder discovers a set of association rules (in the form of "
") at multiple concept levels from the relevant set(s)
of data in a database. For example, one may discover a set of symptoms frequently
occurring together with certain kinds of diseases and further study the reasons behind
DBMiner data classifier
A classifier analyzes a set of training data(i.e., a set of objects whose class label is
known) and constructs a model for each class based on the features in the data. A set
of classification rules is generated by such a classification process, which can be
used to classify future data and develop a better understanding of each class in the
database. For example, one may classify diseases and provide the symptoms which
describe each class or subclass of diseases.
A predictor predicts the possible values of some missing data or the value
distribution of certain attributes in a set of objects. This involves finding the set of
attributes relevant to the attribute of interest (by some statistical analysis) and
predicting the value distribution based on the set of data similar to the selected
object(s). For example, an employee's potential salary can be predicted based on the
salary distribution of similar employees in the company.
DBMiner meta-rule guided miner
A meta-rule guided miner is a data mining mechanism which takes a user-specified
meta-rule form, such as " " as a pattern to confine the
search for desired rules. For example, one may specify the discovered rules to be in
the form of "" in order to find the relatinships between a student's major and his/her
gpa in a university database.
DBMiner evolution evaluator
A data evolution evaluator evaluates the data evolution regularities for certain
objects whose behavior changes over time. This may include characterization,
classification, association, or clustering of time-related data. For example, one may
find the general characteristics of the companies whose stock price has gone up over
20% last year or evaluate the trend or particular growth patterns of certain stocks.
DBMiner deviation evaluator
A deviation evaluator evaluates the deviation patterns for a set of task-relevant data
in the database. For example, one may discover and evaluate a set of stocks whose
behavior deviates from the trend of the majority of stocks during a certain period of
time. The module contains the following three functions:
1. recognizes or identifies the general trend and/or behavior for data in the
2. detects the set of data which deviates from such a trend or behavior, and
3. summarizes the general characteristics of deviation data.
DBMiner user interfaces
Three user interfaces, UNIX-based, Windows/NT-based, and WWW/netscape-based
GUIs have been developed to allow users to interactively discover multiple-level
knowledge in large relational databases, it integrates well with existing commercial
database systems with high performance, and is robust at handling noise and
Further Development of DBMiner 
The DBMiner system is currently being extended in several directions, as illustrated
• Further enhancement of the power and efficiency of data mining in relational
database systems, including the improvement of system performance and rule
discovery quality for the existing functional modules, and the development of
techniques for mining new kinds of rules, especially on time-related data.
• Integration, maintenance and application of discovered knowledge, including
incremental update of discovered rules, removal of redundant or less interesting
rules, merging of discovered rules into a knowledge-base, intelligent query
answering using discovered knowledge, and the construction of multiple layered
• Extension of data mining technique towards advanced and/or special purpose
database systems, including extended-relational, object-oriented, text, spatial,
temporal, and heterogeneous databases. Currently, two such data mining systems,
GeoMiner and WebMiner, for mining knowledge in spatial databases and the
Internet information-base respectively, are being under design and construction.
We have developed a list of 14 criteria for evaluating DBMiner. These criteria can be put into
four categories: Capability, Learnability/Usability, Interoperability, and Flexibility. Capability
measures what a desktop tool can do, and how well it does it; Learnability/Usability means
how easy a tool is to learn and use; Interoperability means a tool’s ability to interface with
other computer applications; and Flexibility is the ease with which one can alter critical
guiding parameters, or create a customized environment.
We have used FoodMart database, which comes with MS SQL server for testing. The
FoodMart is made up of two cubes, Sales and Warehouse. The sales cube consists of 13
dimensions such as "Customers", "Educational Level", etc. and 7 fact tables (measurements)
such as “Profit”, “Sales Average”, “Sales Count”, etc. Warehouse cube consists of 7
dimensions and 7 measurements. The database is loaded with enough data sufficient for our
We have used DBMiner on Pentium 166 MHZ with 64 MB RAM, running Windows 2000.
Table 1 summarizes the results.
The criteria for capability we have selected are whether it is scalable to larger databases, has
programming language for automation, provides useful output reports, and if it has
Given the training set of data we found that the scalability factor of the software was efficient.
Furthermore, The software does not use any programming language for automation, however it
has many wizards, which guides the end-user to get the tasks done. The software uses DMQL
(Data Mining Query Language) for its own task, however the user is not able to manipulate the
The visualization part of the software uses many graphics including ball graph, ball chart, grid,
and frequent item sets for visualizing Association, Classification, and Clustering, however pie
charts and correlation plots were missing. In addition, tree browsing was in graph view, which
was confusing. There is another part of visualization, OLAP browser, which uses MS Excel
2000 visualization capabilities. The OLAP browser depends on MS Excel 2000 without which
the OLAP browser is unable to function.
DBMiner shows the statistics report, which it calls it “mining results statistics.” This statistic
mentions the number of items identified; however, it does not mention the characteristics of the
results as well as analyzing the statistical results. In addition, we were not able to print any of
results from Associations, Classifications, and Clustering, as well as the statistics results, the
page was blank!
There are six criteria for this category, namely tutorials, wizards, easy to learn, user’s manual,
online help, and interface.
DBMiner is not a complex program for people familiar with data mining. However, if you are
new to data mining the software does not include a tutorial to walk you through with an
Wizards are built in for automating the tasks of data mining. The wizards let the user select
appropriate options for the tasks.
The user interface is very simple and standard. The menus are appropriate and so are the tool
bars. However, we found that some tool bars did not perform very well when enabled, such as
the tools in the visualization pane and the magnifier. In addition, some of the commands under
menus do not have any function associated with them, such as the “Export” command under
the file menu.
The user’s manual is well constructed for a user to find appropriate way to explore, however
the style of the user’s manual is old, not web fashioned. Furthermore, The user’s manual does
not contain links to other relevant topics. In addition, DBMiner has an average on line help.
Overall, we think the software is easy to learn and interact with given the user’s knowledge of
data mining techniques.
We use three criteria for this category: importing data, exporting data, and whether it has links
to other applications.
DBMiner does not support importing and exporting of data. However, it communicates with
MS OLAP Server and has MS Excel 2000 embedded as a visualization tool for OLAP
Two criteria can be defined to explain the flexibility of the application namely if the work
environment is customizable and whether it is possible to write or change the code.
DBMiner uses DBQL for its internal functionality, however it is not possible to change or
DBMiner has the flexibility to let the user change the values of settings after each task is done.
For example, it is possible to increase/decrease the support threshold or the confidence
threshold if the user is not happy with the current level.
DBMiner depends only on MS SQL Server as its back-end and uses MS Excel 2000 as its
visualization tool for OLAP browsing. Other unavailable functional modules are data
dispersion module, time-serial analysis module, and prediction module.
DBMiner is a good data-mining tool as it reflects a user-friendly environment for users of all
category. The discussion above about the software substantiates our evaluation about the
software though there is a wide scope of improvement for the commercial version.
 Copied and pasted from “DBMiner: A data mining tool for large relational databases,”
 Bhavani Thuraisingham, Data Mining: technologies, techniques, tools, and trends, CRC
 John F. Elder IV & Dean W. Abbott Elder Research, A Comparison of Leading Data Mining Tools, 1998
Table 1: Capability, Learnability/Usability, Interoperability, and Flexibility
Excellent Good Average Needs Improvement Poor Does Not Exist
Has programming language √
Provides useful output reports √
Easy to learn √
User’s manual √
Online help √
Importing data √
Exporting data √
Links to other applications √
Customizable work environment √
Ability to write or change codes √