An annotated bibliography
Thesis statement. Data mining means searching for certain patterns within large sets of
data, which creates a lot of possibilities for business managers and decision makers. By
analyzing those patterns, better business decisions can be made in order to enable
businesses to achieve greater financial and entrepreneurial success.
Keywords: data mining, knowledge discovery in databases (KDD), data mining
technologies (DMT), decision support systems (DSS).
Academy for Computing Machinery. (2007). SIGKDD. ACM Special Interest Group on
Knowledge Discovery and Data Mining. Retrieved February 12, 2007 from
http://www.acm.org/sigs/sigkdd/webcasts.php. [Authoritative website].
This website belongs to one of the special interests groups (SIG) of ACM, one of the first
academic societies that promoted computational research. This website is rather simple
looking. It has all necessary information about this group, and lists the people involved in it,
including their affiliations, most of them with academic organizations. The links all work.
There is also a newsletter available from this website which contains the latest news in the
field of data mining and knowledge discovery. Overall, this is a very useful website.
Bramer, M.A. (1999). Knowledge discovery and data mining. London, UK: The Institution of
Electrical Engineers. [Book]
This book addresses issues of data mining within different research subjects such as
chemistry, medical diagnosis, and electric load prediction. Part 1 examines a broad spectrum
of technical issues in knowledge discovery and data mining; part 2 contains articles on the
practical applications of knowledge discovery and data mining. These practical applications
are within such fields as health-information analysis, meteorology, chemistry and the
electricity-supply industry. This book can be helpful to researchers within the fields it
discusses, as well as knowledge professionals in general. However, the editor emphasizes
that the knowledge of a discipline of application is required in order to conduct a successful
data mining experiment.
Ganguly, A.R, Gupta, A., & Khan, S. (2006). Data mining and decision support for
business and science. In Encyclopedia of data warehousing and mining (Vol.I,
pp.233-238). Hershey, PA ; London, UK : Idea group reference. [Reference
This article introduces the field of data mining for business and science. The authors are
affiliated with academic and research institutions in the United States, such as the
University of Arizona and the University of South Florida, which leads the reader to
believe that they have extensive knowledge of data mining. The article begins with an
introduction, where readers can get acquainted with the subject of data mining and the
technologies and applications that are involved in data processing. The article’s main
idea is introduced in Main thrust (it is a typical entry construction in this particular
reference source) where the authors discuss scientific and business applications, present
an overview of emerging technologies and previous approaches, and discuss common
features of data mining for science and technology. This source contains a lot of
references to scientific and technical literature, including journals, books and
authoritative web sites (NASA’s, for example.) The article also has an extensive list of
references at the end. The intended audience is academics and business professionals
who are interested in data mining applications and want to find quick information that
will direct them to further resources on this subject. The article contains two tables that
present analytical information technologies (data mining and decision support systems)
and examples of their applications.
Guernsey, L. (2003, October 16). Digging for nuggets of wisdom. The New York Times, p.
G1. [Popular print article].
Written by a journalist, this article is very informative and explains data mining applications
in various fields of science. Emphasizing that the amount of information available on the web
and in print is overwhelming and difficult to analyze, the author turns to the practitioners who
have already figured out how to search through vast amounts of data. For example, Dr.
Liebman uses a statistical software called SPSS in order to do text mining, which is derived
from the idea of data mining. The main idea of the article is that it is possible to deal with the
vast amounts of information that are out there as long as one approaches it intelligently. The
language of the article is popular, easily understood by general readers. The New York Times
is one of the most widely read newspapers in the country, so the article can be of use to
readers who have never heard of data or text mining. Those readers should find the subject
practical and interesting, if not fascinating.
Hu, J., & Zhong, N. (2006). Organizing multiple data sources for developing intelligent e-
business portals. Data Mining and Knowledge Discovery, 12(2-3), 127-150. [Print
Both authors are affiliated with Maebashi Institute of Technology in Japan. This article
addresses applications of data mining in business. It is organized into several parts, beginning
with the introduction of its main subject – creating and managing e-portals that serve as
gateways to personalized information. The authors present a three-tier work-flow model.
Those levels are data-flow, mining-flow, and knowledge-flow. All three of them contribute to
the model of a multi-layered grid, which is essential for creating an e-portal. The article
begins with a literature review and previous experience, and then proceeds to a discussion of
the main subject with graphs, tables and computations. It is a scholarly article, written for
professionals in data mining and knowledge discovery, so its language is fairly technical.
However, the average person can make sense of the concept by reading the introduction. It is
a useful article for those involved in scientific research of data mining for business
Kohavi, R. & Provost, F. (2001). Applications of data mining to electronic commerce. Data
Mining and Knowledge Discovery, 5, 5-10. [Secondary source].
A rather critical analysis of the current situation in data mining in e-commerce. The authors
talk a lot about problems and issues in this particular field, paying especial attention to its
utilization. “High potential reward, accompanied by high risk” seems to be the main theme of
this article. Written in clear, understandable language, it could be useful to business managers
and information specialists with broad interests. It is also a literature review that tries to
summarize what has been written, and what current problems are. (One theme that seems to
be present in every reviewed paper is problem-specific knowledge and how to incorporate it
into the knowledge discovery process.) At the same time, it is a philosophical essay rather
than simply a technical article. There are no formulas, or graphs, or charts, just analysis and
critical opinions – this is what differentiates this article from the majority of others written by
scientists. At the end, the authors express cautious optimism about future studies of data
mining in e-commerce, pointing out that there are a lot of issues to be solved. This is a
secondary source because it addresses previous research instead of proposing a new and
original methods or ideas.
Kutz, G.D. (2003). Data mining: Results and challenges for government program audits and
investigations. Testimony before the Subcommittee on Technology, Information
Policy, Intergovernmental Relations and the Census, Committee on Government
Reform, House of Representatives. Washington, D.C. : United States General
Accounting Office. Retrieved January 30, 2007, from
http://www.gao.gov/new.items/d03591t.pdf. [Government document]
This document covers the issues of internal control within certain government agencies, such
as the Department of Defense (DOD). The use of government credit cards was traced using
data mining techniques in order to scrutinize the vendors and the appropriateness of the
expenses by government employees. This process helped to uncover many abuses and waste
of government funds, and helped to improve control over travel spending. Even though this
document is written in somewhat bureaucratic language, it is easy to understand for the
student or lay person. A summary at the beginning of the document and conclusions at the
end help readers to get a clear picture of the problem and its solution. A list of related
publications by the General Accounting Office (GAO) is available at the end of the paper.
This source is very helpful for those who want to learn about the practical applications of data
Lee, J.H., & Park, S.C. (2003). Agent and data mining based decision support system and its
adaptation to a new customer-centric electronic commerce. [Electronic version].
Expert Systems with Applications, 25(4), 619-635. [Online journal]
Electronic commerce (e-commerce, EC) is a rapidly developing means of conducting
business. In order to be competitive, manufacturing companies use the Internet not only for
promotion, but also to buy and sell. This article is devoted to a customer-centric e-commerce
model using a concept called process transparency. “Transparency is a knowledge-based
concept that implies participants have intelligence about market around them” according to
the authors. It’s crucial for manufacturers to learn about their potential customers’ buying
behaviors and preferences in order to market their products. The data mining process was
successfully integrated into the proposed EC model for the generation of an optimal sampling
Mukherjee, S., Chen, Z., & Gangopadhyay, A. (2006). A Privacy-preserving technique for
Euclidean distance-based mining algorithms using Fourier-related transforms.
[Electronic version]. VLDB Journal, 15, 293-315. [Primary source].
This article is a good example of a primary source. It is written by researchers from the
University of Maryland, Baltimore. The authors propose their own algorithm for the
improvement of data mining methods. It is an important issue, especially when dealing with
large amounts of data. The problem is that often data is stored in one place and analyzed in
another, and then a third party is responsible for analyzing this data. This means that data
should be stripped of personal characteristics in order to preserve the privacy of customers’
information. The authors of this article come up with their original idea using already
existing Fourier-related transforms. The article is intended for professional researchers in the
field of data mining, hence the language of the article is technical and specifically oriented to
people working within field of data mining. There are a lot of charts and mathematical
algorithms that prove and illustrate the idea of the proposed method. Published in an
academic journal, this article is a good example of a primary source in sciences.
SPSS, Inc. (2007). SPSS. Data mining. February 11, 2007 from
http://www.spss.com/data_mining/. [Authoritative website].
This is information about data mining provided by a company that introduces
pioneering software for statistical analysis (SPSS stands for Statistical Package for the Social
Sciences). Now SPSS is considered as one of the leading companies in data mining research.
Their product, Clementine, was one of the first data mining tools back in 1994. This web site
is well organized, and the address and contact information are clearly shown. The list of
related business problems that can be addressed by SPSS products makes the search clear and
straightforward. The links all work and the only advertisement present is within the links to
the company’s products.