• Like
Download file
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
133
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DATA MINING An annotated bibliography Thesis statement. Data mining means searching for certain patters within large sets of data, which brings a lot of possibilities for business managers and decision makers. By analyzing those patterns the better business decisions can be made in order to bring businesses toward financial and entrepreneurial success. Keywords: data mining, knowledge discovery in databases (KDD), data mining technologies (DMT), decision support systems (DSS) Academy for Computing Machinery. (2007). SIGKDD. ACM Special Interest Group on Knowledge Discovery and Data Mining. Retrieved February 12, 2007 from http://www.acm.org/sigs/sigkdd/webcasts.php. [Authoritative website]. This website belongs to one of the special interests group (SIG) of ACM, one of the first academic societies that promoted computational research. This website is rather simple looking. It has all necessary information about this group, lists the people involved in it including their affiliations and the affiliations are all academic organizations for most of the time. The links are working. There is also newsletter publication available from this website, and within the publications are the latest news in the field of data mining and knowledge discovery. Overall this is a very useful website. Bramer, M.A. (1999). Knowledge discovery and data mining. London, UK: The Institution of Electrical Engineers. [Book] Huge volumes of data are stored in the data warehouses around the world. Without examining it could get lost and never used for the needs of humanity. This book addresses issues of data mining within different research subjects: chemistry, medical diagnosis, electric load prediction, and many others. Part 1 examines a broad spectrum of technical issues of knowledge discovery and data mining; part 2 contains articles on practical applications of knowledge discovery and data mining. These practical applications are within fields of health-information analysis, meteorology, chemistry and electricity-supply industry. This book can be helpful to the researchers within the fields it is applied to and knowledge professionals in general as well. However, the editor emphasizes that the knowledge of a discipline of application is required in order to conduct a successful data mining experiment. Ganguly, A.R, Gupta, A., & Khan, S. (2006). Data mining and decision support for business and science. In Encyclopedia of data warehousing and mining (Vol.I, pp.233-238). Hershey, PA ; London, UK : Idea group reference. [Reference book] This article introduces the field of data mining for business and science. The authors are affiliated with the research and academic institutions in the United States, such as University of Arizona and University of South Florida, which leads the reader to believe that they have good background knowledge on a subject of data mining. The article
  • 2. begins with the introduction, where readers can get acquainted with the subject of data mining and the technologies and applications that are involved in data processing. There are a lot of abbreviations used in this article. The main idea of the article is introduced in Main thrust (it is a typical entry construction in this particular reference source) where authors discuss scientific and business applications, present overview of emerging technologies, previous approaches, and discuss common features of data mining for science and technologies presenting at the same time particularities that are specific to either science or business. This source contains a lot of references to the scientific and technical literature, including journals, books and authoritative web sites (NASA, for example.) The article has an extensive list of references at the end. The intended audience is academic and business populations who are interested in data mining applications and would like to find quick information that will direct them to further resources on this subject. The article contains two tables that present analytical information technologies (data mining and decision support systems) and examples of their applications. Guernsey, L. (2003, October 16). Digging for nuggets of wisdom. The New York Times, p. G1. [Popular print article]. Written by a journalist, this article is very informative and explains the data mining use applications in various fields of science. Emphasizing that the amount of information available on the web and in print is overwhelming and difficult to analyze, the author turns to the practitioners who already figured out how to search through the vast amounts of data. For example, Dr. Liebman uses a statistical software SPSS in order to do text mining, which is derived from the idea of data mining. The main idea of an article is that it is possible to deal with the amount if information, but it takes an intelligent human being in order to make sense out of the results. The language of the article is popular, easily understood by general readers. New York Times is one of the most reputable newspapers in the country, hence the article can be of use to many readers that never heard of a concept of data or text mining. Those readers will find the idea very practical and interesting, if not fascinating. Hu, J., & Zhong, N. (2006). Organizing multiple data sources for developing intelligent e- business portals. Data Mining and Knowledge Discovery, 12(2-3), 127-150. [Print scholarly article]. Both authors are affiliated with Maebashi Institute of Technology in Japan. This article addresses applications of data mining in business enterprise. It is organized into separate parts beginning with introduction of a subject – creating and managing e-portals that serve as gateways to personalized information. The authors present a three-tier work-flow model. Those levels are data-flow, mining-flow, and knowledge-flow. All three of them contribute to the model of a multi-layered grid, which is essential for creating e-portal. The article logically follows the literature review and previous experience, and then there are discussions on a major subject supplied with the graphs, tables and computations. It is a scholarly article directed towards professionals in data mining and knowledge discovery. It is written in a technical language that is best understood by specialists. However the average person can
  • 3. make a sense of the concept by reading an introduction. It is a useful article for those who are involved in a scientific research of data mining for business applications. Kohavi, R. & Provost, F. (2001). Applications of data mining to electronic commerce. Data Mining and Knowledge Discovery, 5, 5-10. [Secondary source]. A rather critical analysis of what is going on with data mining in e-commerce. The authors talk a lot about problems and issues in this particular field acting rather cautiously about its utilizing. “High potential reward, accompanied by high risk” – it seems to be a main theme of this article. Written in a clear understandable language, it could be useful to business managers and information specialists of very broad interests. It is also a literature review that tries to summarize what was written before and what current problems are. One theme seems to be present in every reviewed paper – a problem-specific knowledge and how to incorporate this kind of knowledge into knowledge discovery process. At the same time it is a philosophical essay rather than a technical article. No formulas, or graphs, or charts, just analysis and critical opinions – this is a differentiating point from the majority of articles written by scientists. At the end the authors express some cautious optimism about future studies of data mining in e-commerce, pointing out that there are a lot of issues to be solved. This is a secondary source because it addresses previously done research instead of proposing a new original method or idea. Kutz, G.D. (2003). Data mining: Results and challenges for government program audits and investigations. Testimony before the Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census, Committee on Government Reform, House of Representatives. Washington, D.C. : United States General Accounting Office. Retrieved January 30, 2007, from http://www.gao.gov/new.items/d03591t.pdf. [Government document] This document covers the issues of internal control within certain government agencies, such as Department of Defense (DOD). The use of government credit cards was tracked down using data mining techniques in order to scrutinize the vendors and the appropriateness of the expenses by government employees. This process helped to uncover many abuses and waste of government funds and helped to improve control over the travel spending. Even though this document is written in an official language, it is easy to understand for a student or a lay person. A summary at the beginning of a document and conclusions at the end help readers to make clearer picture of a problem and its solution. A list of related publications by GAO is available at the end of this paper. This source is certainly very helpful for those who would like to learn about practical applications of data mining. Lee, J.H., & Park, S.C. (2003). Agent and data mining based decision support system and its adaptation to a new customer-centric electronic commerce. [Electronic version]. Expert Systems with Applications, 25(4), 619-635. [Online journal]
  • 4. Electronic commerce (e-commerce, EC) is a new fast developing way of conducting business. In order to be competitive the manufacturing companies use Internet not only for promotion, but also to buy and sell. It’s crucial for manufacturers to learn about their potential buyers’ buying behaviors and preferences in order to market their products. This article is devoted to a new customer-centric e-commerce model using a concept called process transparency. “Transparency is a knowledge-based concept that implies participants have intelligence about market around them” according to the authors. Data mining process was successfully integrated into the proposed EC model for the generation of optimal sampling method. Mukherjee, S., Chen, Z., & Gangopadhyay, A. (2006). A Privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. [Electronic version]. VLDB Journal, 15, 293-315. [Primary source]. This article is a good example of a primary source. It is written by researchers from the University of Maryland Baltimore County. The authors propose their own algorithm for improvement of data mining methods. It is important issue especially when dealing with large amounts of data. The problem is that often data is stored in one place and analyzed in another, and then the third party is responsible for analyzing this data. It means that data should be stripped of some personal characteristics in order to preserve the privacy of customers’ information. The authors of this article came up with their original idea using already existing Fourier-related transforms. The article is addressing the professional researchers in the field of data mining hence the language of the article is very technical and specifically oriented to the people working within data mining field. There are a lot of charts and mathematical algorithms that prove and illustrate the idea of the proposed method. Published in an academic journal this article is a good example of a primary source in sciences. SPSS, Inc. (2007). SPSS. Data mining. February 11, 2007 from http://www.spss.com/data_mining/ [Authoritative website]. This is information about data mining provided by a company that introduces pioneering software for statistical analysis (SPSS stands for Statistical Package for the Social Sciences). Now SPSS is considered as one of the leading companies on data mining research. Their product, Clementine, was one of the first data mining tools back in 1994. This web site is well organized, the address and contact information is clearly shown. The list of related business problems that can be addressed by SPSS products makes the search clear and straightforward. The links are working and the only advertisement present is within the links to the company’s products.