Page No. 1
AN OVERVIEW OF KNOWLEDGE DISCOVERY AND DATA MINING
APPROACH IN DATABASES
Mr.Kartik N. Kalpande1, Student, CO-5G
Kartikkalpande5@gmail.com
Mr.Shubham N. Ugale2, Student, CO-5G
itsshubhamhere01@gmail.com
Department of Computer Engg.
Dr.Panjabrao Deshmukh Polytechnic, Shivaji Nagar, Amravati-444603
Abstract: Knowledge Discovery in Databases is the process of finding knowledge in massive amount
of data where data mining is the core of this process. Knowledge Discovery in Databases is an
automatic, exploratory analysis and modeling of large data repositories. Data mining can be used to
mine understandable meaningful patterns from large databases and these patterns may then be
converted into knowledge. Data mining is the process of extracting the information and patterns
derived by the Knowledge Discovery in Databases process which helps in crucial decision-making.
Data mining works with data warehouse and the whole process is divided into action plan to be
performed on data: Selection, transformation, mining and results interpretation. In this paper, we
have included the Overview of Knowledge Discovery and Data Mining and consolidated different
areas of data mining, its techniques and methods in it.
Keywords : Knowledge discovery in databases, data mining, Data mining applications and
Knowledge management
I. INTRODUCTION
Knowledge discovery in databases is a rapidly growing field, whose development is driven by strong
research interests as well as urgent practical, social, and economical needs. Knowledge Discovery in
Databases (KDD) is the process of finding useful knowledge from large dataset. Data preparation,
pattern search, knowledge evaluation and refinement are steps of KDD [3]. In this paper, we provide
an overview of common knowledge discovery tasks and approaches to solve these tasks. Data mining
(DM) is the process where data is analyzed and summarized into useful information. In short, data
mining is process of deriving patterns from large databases [13]. Page No. 2
Knowledge Discovery in Database is the organized process of identifying valid, novel, useful, and
understandable patterns from large and complex data sets. Data Mining (DM) is the core of the
Knowledge Discovery in Database process, involving the inferring of algorithms that explore the
data, develop the model and discover previously unknown patterns. The accessibility and abundance
of data today makes knowledge discovery and Data Mining a matter of considerable importance and
necessity. Given the recent growth of the field, it is not surprising that a wide variety of methods is
now available to the researchers and practitioners. The handbook of Data Mining and Knowledge
Discovery from Data aims to organize all significant methods developed in the field into a coherent
and unified catalog; presents performance evaluation approaches and techniques; and explains with
cases and software tools the use of the different methods [1].
Data mining is one of the most important steps of the knowledge discovery in databases process and
is considered as significant subfield in knowledge management.
II. KNOWLEDGE DISCOVERY IN DATABASE PROCESS
Knowledge Discovery in Databases is the process of finding useful knowledge from large dataset.
Data preparation; pattern search, knowledge evaluation and refinement are steps of Knowledge
Discovery in Databases. The knowledge discovery process (Figure 1) is iterative and interactive,
consisting of nine steps [1]. The process has many “artistic” aspects in the sense that one cannot
present one formula or make a complete taxonomy for the right choices for each step and application
type. Thus it is required to understand the process and the different needs and possibilities in each
step [3].
A simple definition of KDD is as follows: Knowledge discovery in databases is the nontrivial process
of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
Figure 1: An overview of the steps comprising the KDD process
The KDD process is interactive and iterative, involving numerous steps with many decisions being
made by the user. Page No. 3
1. Developing an understanding of the application domain This is the initial preparatory step. It
prepares the scene for understanding what should be done with the many decisions
2. Selecting and creating a data set on which discovery will be performed. Having defined the
goals, the data that will be used for the knowledge discovery should be determined. This includes
finding out what data is available, obtaining additional necessary data, and then integrating all the
data for the knowledge discovery into one data set, including the attributes that will be considered for
the process.
3. Preprocessing and cleansing. In this stage, data reliability is enhanced. It includes data clearing,
such as handling missing values and removal of noise or outliers.
4. Data transformation. In this stage, the generation of better data for the data mining is prepared
and developed. Methods here include dimension reduction (such as feature selection and extraction
and record sampling), and attribute transformation.
5. Choosing the appropriate Data Mining task. We are now ready to decide on which type of Data
Mining to use, for example, classification, regression, or clustering. This mostly depends on the KDD
goals, and also on the previous steps. There are two major goals in Data Mining: prediction and
description. Prediction is often referred to as supervised Data Mining, while descriptive Data Mining
includes the unsupervised and visualization aspects of Data Mining.
6. Choosing the Data Mining algorithm. Having the strategy, we now decide on the tactics. This
stage includes selecting the specific method to be used for searching patterns.
7. Employing the Data Mining algorithm. Finally the implementation of the Data Mining
algorithm is reached.
8. Evaluation. In this stage we evaluate and interpret the mined patterns (rules, reliability etc.), with
respect to the goals defined in the first step.
9. Using the discovered knowledge. We are now ready to incorporate the knowledge into another
system for further action. The knowledge becomes active in the sense that we may make changes to
the system and measure the effects [1].
III. THE DATA MINING STEP OF THE KDD PROCESS
The data mining component of the KDD process often involves repeated iterative application of
particular data mining methods. Data mining is an essential step in the Page No. 4
knowledge discovery in databases (KDD) process that produces useful patterns or models from data.
The terms of KDD and data mining are different. KDD refers to the overall process of discovering
useful knowledge from data. Data mining refers to discover new patterns from a wealth of data in
databases by focusing on the algorithms to extract useful knowledge [2]. At the core of the KDD
process are the data mining methods for extracting patterns from data. These methods can have
different goals, dependent on the intended outcome of the overall KDD process. There are different
data mining techniques which are used to extract information from a data set and transform it into an
understandable format for further use. Most data mining goals fall under the following categories:
Data Processing: Depending on the goals and requirements of the KDD process, analysts may select,
filter, aggregate, sample, clean and/or transform data.
Prediction: Given a data item and a predictive model, predict the value for a specific attribute of the
data item.
Regression: Given a set of data items, regression is the analysis of the dependency of some attribute
values upon the values of other attributes in the same item, and the automatic production of a model
that can predict these attribute values for new records.
Classification: Given a set of predefined categorical classes, determine to which of these classes a
specific data item belongs.
Clustering: Given a set of data items, partition this set into a set of classes such that items with
similar characteristics are grouped together. Clustering is best used for finding groups of items that
are similar.
Link Analysis (Associations): Given a set of data items, identify relationships between attributes and
items such as the presence of one pattern implies the presence of another pattern
Model Visualization: Visualization plays an important role in making the discovered knowledge
understandable and interpretable by humans. Besides, the human eye-brain system itself still remains
the best pattern-recognition device known.
Exploratory Data Analysis (EDA): Exploratory data analysis (EDA) is the interactive exploration of
a data set without heavy dependence on preconceived assumptions and models, thus attempting to
identify interesting patterns [7].
Data mining has two primary objectives of prediction and description. Prediction involves using
some variables in data sets in order to predict unknown values of other relevant variables Page No. 5
IV. CONCLUSION
Knowledge discovery can be broadly defined as the automated discovery of novel and useful
information from commercial databases. Data mining is one step at the core of the knowledge
discovery process, dealing with the extraction of patterns and relationships from large amounts of
data. Data Mining Techniques are used to analyze data and extract useful information from large
amount of data. Knowledge Discovery (KD) is a nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns from large collections of data. One of the
KD steps is Data Mining (DM). DM is the step that is concerned with the actual extraction of
knowledge from data, in contrast to the KD process that is concerned with many other things like
understanding and preparation of the data, verification and application of the discovered knowledge.
In this paper we have discussed about all issues of Knowledge discovery and Data Mining
Techniques in databases.
REFERENCES
[1] Oded Maimon, Lior Rokach, “Introduction to Knowledge Discovery in databases”, Data Mining
and Knowledge Discovery Handbook
[2] Tipawan Silwattananusarn and Assoc.Prof. Dr. KulthidaTuamsuk, “Data Mining and Its
Applications for Knowledge Management : A Literature Review from 2007 to 2012”, International
Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.5, September 2012
[3] Pratiyush Guleria and Manu Sood, “Data Mining in education : A Review on the Knowledge
Discovery perspective”, International Journal of Data Mining & Knowledge Management Process
(IJDKP) Vol.4, No.5, September 2014
[4] Krzysztof J. Cios and Lukasz A. Kurgan, “Trends in Data Mining and Knowledge Discovery”
[5] Kirsten Wahlstrom, John F. Roddick, “On the Impact of Knowledge Discovery and Data Mining”
[6] Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, “Knowledge Discovery and Data
Mining: Towards a Unifying Framework”
[7] Michael Goebel, Le Gruenwald, “A Survey of Data Mining and Knowledge Discovery Software
Tools”
[8] Zaki, M., Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data
Engineering, 12(3):372–390, 2000
[9] An, X. & Wang, W. (2010). Knowledge management technologies and applications: A literature
review. IEEE, 138-141. doi:10.1109/ICAMS.2010.5553046 Page No. 6
[10] Gorunescu, F. (2011). Data Mining: Concepts, Models, and Techniques. India: Springer
[11] Sachin, R.B, Vijay, M.S, “A Survey and Future Vision of Data Mining in Educational Field,
published in 2012”,Second International Conference on Advanced Computing & Communication
Technologies (ACCT), Rohtak, Haryana, ISBN 978-1-4673-0471-9, 7-8 Jan. 2012, pp 96 – 100.
[12] Jaideep Srivastava, Prasanna Desikan, Vipin Kumar, “Web Mining - Concepts, Applications &
Research Directions”, AHPCRC Technical Report
[13] Sang Jun Lee, Keng Siau, “A review of data mining techniques, Industrial Management & Data
Systems”

knowledge discovery and data mining approach in databases (2)

  • 1.
    Page No. 1 ANOVERVIEW OF KNOWLEDGE DISCOVERY AND DATA MINING APPROACH IN DATABASES Mr.Kartik N. Kalpande1, Student, CO-5G Kartikkalpande5@gmail.com Mr.Shubham N. Ugale2, Student, CO-5G itsshubhamhere01@gmail.com Department of Computer Engg. Dr.Panjabrao Deshmukh Polytechnic, Shivaji Nagar, Amravati-444603 Abstract: Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Knowledge Discovery in Databases is an automatic, exploratory analysis and modeling of large data repositories. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge. Data mining is the process of extracting the information and patterns derived by the Knowledge Discovery in Databases process which helps in crucial decision-making. Data mining works with data warehouse and the whole process is divided into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have included the Overview of Knowledge Discovery and Data Mining and consolidated different areas of data mining, its techniques and methods in it. Keywords : Knowledge discovery in databases, data mining, Data mining applications and Knowledge management I. INTRODUCTION Knowledge discovery in databases is a rapidly growing field, whose development is driven by strong research interests as well as urgent practical, social, and economical needs. Knowledge Discovery in Databases (KDD) is the process of finding useful knowledge from large dataset. Data preparation, pattern search, knowledge evaluation and refinement are steps of KDD [3]. In this paper, we provide an overview of common knowledge discovery tasks and approaches to solve these tasks. Data mining (DM) is the process where data is analyzed and summarized into useful information. In short, data mining is process of deriving patterns from large databases [13]. Page No. 2
  • 2.
    Knowledge Discovery inDatabase is the organized process of identifying valid, novel, useful, and understandable patterns from large and complex data sets. Data Mining (DM) is the core of the Knowledge Discovery in Database process, involving the inferring of algorithms that explore the data, develop the model and discover previously unknown patterns. The accessibility and abundance of data today makes knowledge discovery and Data Mining a matter of considerable importance and necessity. Given the recent growth of the field, it is not surprising that a wide variety of methods is now available to the researchers and practitioners. The handbook of Data Mining and Knowledge Discovery from Data aims to organize all significant methods developed in the field into a coherent and unified catalog; presents performance evaluation approaches and techniques; and explains with cases and software tools the use of the different methods [1]. Data mining is one of the most important steps of the knowledge discovery in databases process and is considered as significant subfield in knowledge management. II. KNOWLEDGE DISCOVERY IN DATABASE PROCESS Knowledge Discovery in Databases is the process of finding useful knowledge from large dataset. Data preparation; pattern search, knowledge evaluation and refinement are steps of Knowledge Discovery in Databases. The knowledge discovery process (Figure 1) is iterative and interactive, consisting of nine steps [1]. The process has many “artistic” aspects in the sense that one cannot present one formula or make a complete taxonomy for the right choices for each step and application type. Thus it is required to understand the process and the different needs and possibilities in each step [3]. A simple definition of KDD is as follows: Knowledge discovery in databases is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. Figure 1: An overview of the steps comprising the KDD process
  • 3.
    The KDD processis interactive and iterative, involving numerous steps with many decisions being made by the user. Page No. 3 1. Developing an understanding of the application domain This is the initial preparatory step. It prepares the scene for understanding what should be done with the many decisions 2. Selecting and creating a data set on which discovery will be performed. Having defined the goals, the data that will be used for the knowledge discovery should be determined. This includes finding out what data is available, obtaining additional necessary data, and then integrating all the data for the knowledge discovery into one data set, including the attributes that will be considered for the process. 3. Preprocessing and cleansing. In this stage, data reliability is enhanced. It includes data clearing, such as handling missing values and removal of noise or outliers. 4. Data transformation. In this stage, the generation of better data for the data mining is prepared and developed. Methods here include dimension reduction (such as feature selection and extraction and record sampling), and attribute transformation. 5. Choosing the appropriate Data Mining task. We are now ready to decide on which type of Data Mining to use, for example, classification, regression, or clustering. This mostly depends on the KDD goals, and also on the previous steps. There are two major goals in Data Mining: prediction and description. Prediction is often referred to as supervised Data Mining, while descriptive Data Mining includes the unsupervised and visualization aspects of Data Mining. 6. Choosing the Data Mining algorithm. Having the strategy, we now decide on the tactics. This stage includes selecting the specific method to be used for searching patterns. 7. Employing the Data Mining algorithm. Finally the implementation of the Data Mining algorithm is reached. 8. Evaluation. In this stage we evaluate and interpret the mined patterns (rules, reliability etc.), with respect to the goals defined in the first step. 9. Using the discovered knowledge. We are now ready to incorporate the knowledge into another system for further action. The knowledge becomes active in the sense that we may make changes to the system and measure the effects [1].
  • 4.
    III. THE DATAMINING STEP OF THE KDD PROCESS The data mining component of the KDD process often involves repeated iterative application of particular data mining methods. Data mining is an essential step in the Page No. 4
  • 6.
    knowledge discovery indatabases (KDD) process that produces useful patterns or models from data. The terms of KDD and data mining are different. KDD refers to the overall process of discovering useful knowledge from data. Data mining refers to discover new patterns from a wealth of data in databases by focusing on the algorithms to extract useful knowledge [2]. At the core of the KDD process are the data mining methods for extracting patterns from data. These methods can have different goals, dependent on the intended outcome of the overall KDD process. There are different data mining techniques which are used to extract information from a data set and transform it into an understandable format for further use. Most data mining goals fall under the following categories: Data Processing: Depending on the goals and requirements of the KDD process, analysts may select, filter, aggregate, sample, clean and/or transform data. Prediction: Given a data item and a predictive model, predict the value for a specific attribute of the data item. Regression: Given a set of data items, regression is the analysis of the dependency of some attribute values upon the values of other attributes in the same item, and the automatic production of a model that can predict these attribute values for new records. Classification: Given a set of predefined categorical classes, determine to which of these classes a specific data item belongs. Clustering: Given a set of data items, partition this set into a set of classes such that items with similar characteristics are grouped together. Clustering is best used for finding groups of items that are similar. Link Analysis (Associations): Given a set of data items, identify relationships between attributes and items such as the presence of one pattern implies the presence of another pattern Model Visualization: Visualization plays an important role in making the discovered knowledge understandable and interpretable by humans. Besides, the human eye-brain system itself still remains the best pattern-recognition device known. Exploratory Data Analysis (EDA): Exploratory data analysis (EDA) is the interactive exploration of a data set without heavy dependence on preconceived assumptions and models, thus attempting to identify interesting patterns [7]. Data mining has two primary objectives of prediction and description. Prediction involves using some variables in data sets in order to predict unknown values of other relevant variables Page No. 5
  • 7.
    IV. CONCLUSION Knowledge discoverycan be broadly defined as the automated discovery of novel and useful information from commercial databases. Data mining is one step at the core of the knowledge discovery process, dealing with the extraction of patterns and relationships from large amounts of data. Data Mining Techniques are used to analyze data and extract useful information from large amount of data. Knowledge Discovery (KD) is a nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns from large collections of data. One of the KD steps is Data Mining (DM). DM is the step that is concerned with the actual extraction of knowledge from data, in contrast to the KD process that is concerned with many other things like understanding and preparation of the data, verification and application of the discovered knowledge. In this paper we have discussed about all issues of Knowledge discovery and Data Mining Techniques in databases. REFERENCES [1] Oded Maimon, Lior Rokach, “Introduction to Knowledge Discovery in databases”, Data Mining and Knowledge Discovery Handbook [2] Tipawan Silwattananusarn and Assoc.Prof. Dr. KulthidaTuamsuk, “Data Mining and Its Applications for Knowledge Management : A Literature Review from 2007 to 2012”, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.5, September 2012 [3] Pratiyush Guleria and Manu Sood, “Data Mining in education : A Review on the Knowledge Discovery perspective”, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014 [4] Krzysztof J. Cios and Lukasz A. Kurgan, “Trends in Data Mining and Knowledge Discovery” [5] Kirsten Wahlstrom, John F. Roddick, “On the Impact of Knowledge Discovery and Data Mining” [6] Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, “Knowledge Discovery and Data Mining: Towards a Unifying Framework” [7] Michael Goebel, Le Gruenwald, “A Survey of Data Mining and Knowledge Discovery Software Tools” [8] Zaki, M., Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372–390, 2000
  • 8.
    [9] An, X.& Wang, W. (2010). Knowledge management technologies and applications: A literature review. IEEE, 138-141. doi:10.1109/ICAMS.2010.5553046 Page No. 6
  • 9.
    [10] Gorunescu, F.(2011). Data Mining: Concepts, Models, and Techniques. India: Springer [11] Sachin, R.B, Vijay, M.S, “A Survey and Future Vision of Data Mining in Educational Field, published in 2012”,Second International Conference on Advanced Computing & Communication Technologies (ACCT), Rohtak, Haryana, ISBN 978-1-4673-0471-9, 7-8 Jan. 2012, pp 96 – 100. [12] Jaideep Srivastava, Prasanna Desikan, Vipin Kumar, “Web Mining - Concepts, Applications & Research Directions”, AHPCRC Technical Report [13] Sang Jun Lee, Keng Siau, “A review of data mining techniques, Industrial Management & Data Systems”