1. Web Mining – Web mining is an application of data mining for di.docx
Fundamentals of data mining and its applications
1. International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
5 | 7 1
Fundamentals of data mining and its applications
Sourav Sarangi and Subrat Swain
Dept. of Biotechnology,
MITS Engineering College,
Rayagada, Odisha
sourav@sierraairtraffic.com and inbredstar@gmail.com
Abstract— This paper focuses on Data mining and its uses in our
life or environment. Simply we can say Data mining is the
essential process where intelligent methods are applied to extract
data. It is the process of discovering new patterns from large data
sets involving methods at the intersection of artificial intelligence,
machine learning, statistics and database systems. The overall
goal of the data mining process is to extract knowledge from a
data set in a human-understandable structure and besides the
raw analysis step involves database. The actual data mining task
is the automatic or semi-automatic analysis of large quantities of
data to extract previously unknown interesting patterns such as
groups of data records, unusual records and dependencies. The
data mining step might identify multiple groups in the data,
which can then be used to obtain more accurate prediction
results by a decision support system. Data mining have many
application or uses. Now these days it is used in the field of
Business, Science, Visual, Music, Telecommunication and many
more. So here we are going to discuss about Data Mining and its
gift or application for human being.
Keywords- Process, Software, Privacy Concerns & Ethics,
Applications
I. INTRODUCTION
Data mining techniques are the result of a long process of
research and product development. Data mining takes
evolutionary process beyond retrospective data access and
navigation to prospective and proactive information delivery.
Data mining, the extraction of hidden predictive information
from large databases. Data mining derives its name from the
similarities between searching for valuable business
information in a large database. It is a powerful new
technology with great potential to help companies focus on the
most important information in their data warehouses. Data
mining techniques can be implemented rapidly on existing
software and hardware platforms to enhance the value of
existing information resources, and can be integrated with new
products and systems.
Data mining is a promising and relatively new technology
that is defined as a process of discovering hidden valuable and
useful knowledge or information by analyzing large amounts
of data storing in databases or data warehouse using different
techniques such as machine learning, artificial intelligence(AI)
and statistical. Data mining is an iterative process that
typically involves the following phases:
1) Problem Definition
2) Data Exploration
3) Data Preparation
4) Modeling
5) Evaluation
6) Deployment
A. Problem Definition
A data mining project starts with the understanding of the
business problem. Data mining experts, business experts, and
domain experts work closely together to define the project
objectives and the requirements from a business perspective.
The project objective is then translated into a data mining
problem definition.
II. PROCESS
Fig.2 Data mining Process
A. Data Exploration
Domain experts understand the meaning of the metadata.
They collect, describe, and explore the data. Data exploration
is a common process in data warehouses which are
characterized by large bulks of data coming from disparate
systems. Data exploration helps a data consumer focus an
information search on the pertinent aspect of relevant data
before true analysis can be achieved.
2. International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
6 | 7 1
B. Data Prepararion
The data preparation normally consumes about 90% of the
time. The outcome of the data preparation phase is the final
data set. Once data sources available are identified, they need
to be selected, cleaned, constructed and formatted into the
desired form. Data preparation means manipulation of data
into a form suitable for further analysis and processing. It is a
process that involves many different tasks and which cannot
be fully automated. Many of the data preparation activities are
routine, tedious, and time consuming. It has been estimated
that data preparation accounts for 60%-80% of the time spent
on a data mining project.
C. Modeling
Modeling techniques have to be selected to be used for the
prepared dataset. In the modeling phase, a frequent exchange
with the domain experts from the data preparation phase is
required. The modeling phase and the evaluation phase are
coupled. They can be repeated several times to change
parameters until optimal values are achieved.
D. Evaluation
The Modeling and Evaluation process are inter related.
Data mining experts evaluate the model. If the model does not
satisfy their expectations, they go back to the modeling phase
and rebuild the model by changing its parameters until optimal
values are achieved. In this phase, new business requirements
may be raised due to new patterns has been discovered in the
model results or from other factors.
E. Deployment
The knowledge or information that gain through data
mining process needs to be presented in such a way that
stakeholders can use it when they want it. The deployment
term says that it is the application of a model for prediction or
classification to new data. After a satisfactory model or set of
models has been identified for a particular application, we
usually want to deploy those models so that predictions or
predicted classifications can quickly be obtained for new data.
For example, a credit card company may want to deploy a
trained model or set of models to quickly identify transactions
which have a high probability of being fraudulent.
III. SOFTWARE
Data mining software is a fairly new phrase that refers for the
procedure by which predictive styles are taken out from info.
Data mining software describes a set of tools used for the
purpose of analyzing vast amounts of data in order to discover
and understand specific patterns. Data mining software
originated in the scientific community where it was used to
discern patterns from data related to scientific studies. Data
mining software quickly found a strong foothold in the
business community as large businesses began to amass vast
amounts of data. Now here some Data mining softwares which
are recently invented or before:
1) Carrot2 – Text and search results clustering
framework.
2) Chemicalize.org – A chemical structure miner and
web search engine.
3) GATE – Natural language processing and language
engineering tool.
4) KNIME – The Konstanz Information Miner, a user
friendly and comprehensive data analytics
framework.
5) Orange – A component-based data mining and
machine learning software suite written in the Python
language.
6) UIMA – The UIMAMENT(UNSTRUCTURED
INFORMATION MANAGE) is a component
framework for analyzing unstructured content such as
text, audio and video, originally developed by IBM.
7) JHep Work– Java cross-platform data analysis
framework developed at ANL.
Weka– A suite of machine learning software written in the
Java language.
IV. PRIVACY CONCERNS & ETHICS
Privacy is a loaded issue. In data mining, the privacy and
legal issues that may ensue are key to the conflict. In recent
years privacy concerns have taken on a more significant role
in American society as merchants, insurance companies, and
government agencies amass warehouses containing personal
data. Some people believe that data mining itself is ethically
neutral. In many cases, the results of data mining applications
such as association rule or classification rule mining can
compromise the privacy of the data. It is important to note that
the term data mining has no ethical implications. The problem
of privacy-preserving data mining has become more important
in recent years because of the increasing ability to store
personal data about users, and the increasing sophistication of
data mining algorithms to leverage this information. A number
of techniques such as randomization and k-anonymity have
been suggested in recent years in order to perform privacy-
preserving data mining.
A. Randominazation
The randomization technique uses data distortion methods
in order to create private representations of the records. In
most cases, the individual records cannot be recovered, but
only aggregate distributions can be recovered. These
aggregate distributions can be used for data mining purposes.
Two kinds of perturbation are possible with the randomization
Method: Additive Perturbation, Multiplicative
Perturbation.
B. The K-anonymity method
An important method for privacy de-identification is the
method of k-anonymity. The motivating factor behind the k-
anonymity technique is that many attributes in the data can
often be considered pseudo-identifiers which can be used in
conjunction with public records in orderto uniquely identify the
records. For example, if the identifications from the records are
removed, attributes such as the birth date and zip-code be used
in order to uniquely identify the identities of the underlying
3. International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
7 | 7 1
records. The idea in k-anonymity is to reduce the granularity of
representation of the data in such a way that a given record
cannot be distinguished from at least (k − 1) other records. In
chapter 5, the k-anonymity method is discussed in detail. A
number of important algorithms for k-anonymity are discussed
in the same chapter.
V. APPLICATIONS
Now Data Mining helps in the field of Industry, Visual,
Music, Science and engineering and many more. Here we are
going to discuss about the most efficient uses or application of
Data mining in the described fields.
Fig.3 Purpose Of Data Mining
A. Industry
Data mining has been used extensively in the banking and
financial markets. In the banking industry, data mining is
heavily used to model and predict credit fraud, to evaluate
risk,to perform trend analysis, and to analyze profitability, as
well as to help with direct marketing campaigns. In the
financial markets, neural networks have been used in stock-
price forecasting,
In option trading, in bond rating, in portfolio management,
in commodity price prediction, in mergers and acquisitions, as
well as in forecasting financial disasters.
Slim margins have pushed retailers into embracing Data
Mining earlier then other industry. Retailers have seen
improved decision-support processes lead directly to improved
efficiency in inventory management and financial forecasting.
Large retail chains and grocery stores store vast amounts of
point-of-sale data that is information rich. One application of
data mining in real estate is the AREAS Property Valuation
product from HNC (Higher National Certificate) Software,
which performs property valuation. Data mining has been used
extensively in the medical industry already. For example,
NeuroMedical Systems used neural networks to perform a pap
smear diagnostic aid. Vysis uses neural networks to perform
protein analysis for drug development.
Fig.4 Customers Details In Bank
B. Science And Engineering
In recent years, data mining has been used widely in the
areas of science and engineering, such as bioinformatics,
genetics, medicine, education and electrical power
engineering. In the study of human genetics, sequence mining
helps address the important goal of understanding the mapping
relationship between the inter-individual variation in human
DNA sequences and variability in disease susceptibility. In the
area of electrical power engineering, data mining methods
have been widely used for condition monitoring of high
voltage electrical equipment. The purpose of condition
monitoring is to obtain valuable information on the
insulation's health status of the equipment. Data clustering
such as self-organizing map (SOM) has been applied on the
vibration monitoring and analysis of transformer on-load tap-
changers (OLTCS). Using vibration monitoring, it can be
observed that each tap change operation generates a signal that
contains information about the condition of the tap changer
contacts and the drive mechanisms. Data mining in
science/engineering is within educational research, where data
mining has been used to study the factors leading students to
choose to engage in behaviors which reduce their learning and
to understand the factors influencing university student
retention. A similar example of the social application of data
mining is its use in expertise finding systems, whereby
descriptors of human expertise are extracted, normalized and
classified so as to facilitate the finding of experts, particularly
in scientific and technical fields. In this way, data mining can
facilitate Institutional memory.
C. Visual
There is a large number of visualization techniques which
can be used for visualizing the data. In addition to standard
2D/3D-techniques such as x-y (x-y-z) plots, bar charts, line
graphs, etc., there are a number of more sophisticated
visualization techniques. These techniques are useful for data
exploration but are limited to relatively small and low-
4. International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
8 | 7 1
dimensional data sets. In the last decade, a large number of
novel information visualization techniques have been
developed, allowing visualizations of multidimensional data
sets without inherent two- or three-dimensional semantics.
D. Music
Data mining techniques and in particular co-occurrence
analysis has been used be used to discover relevant similarities
among music corpora (radio list, CD databases) for the purpose
of classifying music into genres in an objective manner.
VI. CONCLUSION
In this paper we have given an overview of Data Mining
and its application. We are not describing the all aspects of
Data mining, only the process uses and its some application in
different fields or areas. Data mining is a tool that is used by
governments and corporations to predict and establish trends
with specific purposes in mind. Corporations use data mining
to examine buying patterns and predict future trends.
REFERENCES
[1] Data Mining Concept And Techniques By J Han & Michelin Kamber
[2] Intoducing to Data Mining By M Steinbac & V Kumar
[3] Discovering Data Mining From Concept to Implementation By P
Hadjnian, J Verhees & R Sadler
[4] Wikipedia The Free Encyclopedia- www.wikipedia.com
[5] Britannica The Encyclopedia- www.britanica.com
[6] Data Mining And Privacy- www.thearling.com
[7] Data Mining Software- www.emanio.com