SlideShare a Scribd company logo
1 of 4
Download to read offline
International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
5 | 7 1
Fundamentals of data mining and its applications
Sourav Sarangi and Subrat Swain
Dept. of Biotechnology,
MITS Engineering College,
Rayagada, Odisha
sourav@sierraairtraffic.com and inbredstar@gmail.com
Abstract— This paper focuses on Data mining and its uses in our
life or environment. Simply we can say Data mining is the
essential process where intelligent methods are applied to extract
data. It is the process of discovering new patterns from large data
sets involving methods at the intersection of artificial intelligence,
machine learning, statistics and database systems. The overall
goal of the data mining process is to extract knowledge from a
data set in a human-understandable structure and besides the
raw analysis step involves database. The actual data mining task
is the automatic or semi-automatic analysis of large quantities of
data to extract previously unknown interesting patterns such as
groups of data records, unusual records and dependencies. The
data mining step might identify multiple groups in the data,
which can then be used to obtain more accurate prediction
results by a decision support system. Data mining have many
application or uses. Now these days it is used in the field of
Business, Science, Visual, Music, Telecommunication and many
more. So here we are going to discuss about Data Mining and its
gift or application for human being.
Keywords- Process, Software, Privacy Concerns & Ethics,
Applications
I. INTRODUCTION
Data mining techniques are the result of a long process of
research and product development. Data mining takes
evolutionary process beyond retrospective data access and
navigation to prospective and proactive information delivery.
Data mining, the extraction of hidden predictive information
from large databases. Data mining derives its name from the
similarities between searching for valuable business
information in a large database. It is a powerful new
technology with great potential to help companies focus on the
most important information in their data warehouses. Data
mining techniques can be implemented rapidly on existing
software and hardware platforms to enhance the value of
existing information resources, and can be integrated with new
products and systems.
Data mining is a promising and relatively new technology
that is defined as a process of discovering hidden valuable and
useful knowledge or information by analyzing large amounts
of data storing in databases or data warehouse using different
techniques such as machine learning, artificial intelligence(AI)
and statistical. Data mining is an iterative process that
typically involves the following phases:
1) Problem Definition
2) Data Exploration
3) Data Preparation
4) Modeling
5) Evaluation
6) Deployment
A. Problem Definition
A data mining project starts with the understanding of the
business problem. Data mining experts, business experts, and
domain experts work closely together to define the project
objectives and the requirements from a business perspective.
The project objective is then translated into a data mining
problem definition.
II. PROCESS
Fig.2 Data mining Process
A. Data Exploration
Domain experts understand the meaning of the metadata.
They collect, describe, and explore the data. Data exploration
is a common process in data warehouses which are
characterized by large bulks of data coming from disparate
systems. Data exploration helps a data consumer focus an
information search on the pertinent aspect of relevant data
before true analysis can be achieved.
International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
6 | 7 1
B. Data Prepararion
The data preparation normally consumes about 90% of the
time. The outcome of the data preparation phase is the final
data set. Once data sources available are identified, they need
to be selected, cleaned, constructed and formatted into the
desired form. Data preparation means manipulation of data
into a form suitable for further analysis and processing. It is a
process that involves many different tasks and which cannot
be fully automated. Many of the data preparation activities are
routine, tedious, and time consuming. It has been estimated
that data preparation accounts for 60%-80% of the time spent
on a data mining project.
C. Modeling
Modeling techniques have to be selected to be used for the
prepared dataset. In the modeling phase, a frequent exchange
with the domain experts from the data preparation phase is
required. The modeling phase and the evaluation phase are
coupled. They can be repeated several times to change
parameters until optimal values are achieved.
D. Evaluation
The Modeling and Evaluation process are inter related.
Data mining experts evaluate the model. If the model does not
satisfy their expectations, they go back to the modeling phase
and rebuild the model by changing its parameters until optimal
values are achieved. In this phase, new business requirements
may be raised due to new patterns has been discovered in the
model results or from other factors.
E. Deployment
The knowledge or information that gain through data
mining process needs to be presented in such a way that
stakeholders can use it when they want it. The deployment
term says that it is the application of a model for prediction or
classification to new data. After a satisfactory model or set of
models has been identified for a particular application, we
usually want to deploy those models so that predictions or
predicted classifications can quickly be obtained for new data.
For example, a credit card company may want to deploy a
trained model or set of models to quickly identify transactions
which have a high probability of being fraudulent.
III. SOFTWARE
Data mining software is a fairly new phrase that refers for the
procedure by which predictive styles are taken out from info.
Data mining software describes a set of tools used for the
purpose of analyzing vast amounts of data in order to discover
and understand specific patterns. Data mining software
originated in the scientific community where it was used to
discern patterns from data related to scientific studies. Data
mining software quickly found a strong foothold in the
business community as large businesses began to amass vast
amounts of data. Now here some Data mining softwares which
are recently invented or before:
1) Carrot2 – Text and search results clustering
framework.
2) Chemicalize.org – A chemical structure miner and
web search engine.
3) GATE – Natural language processing and language
engineering tool.
4) KNIME – The Konstanz Information Miner, a user
friendly and comprehensive data analytics
framework.
5) Orange – A component-based data mining and
machine learning software suite written in the Python
language.
6) UIMA – The UIMAMENT(UNSTRUCTURED
INFORMATION MANAGE) is a component
framework for analyzing unstructured content such as
text, audio and video, originally developed by IBM.
7) JHep Work– Java cross-platform data analysis
framework developed at ANL.
Weka– A suite of machine learning software written in the
Java language.
IV. PRIVACY CONCERNS & ETHICS
Privacy is a loaded issue. In data mining, the privacy and
legal issues that may ensue are key to the conflict. In recent
years privacy concerns have taken on a more significant role
in American society as merchants, insurance companies, and
government agencies amass warehouses containing personal
data. Some people believe that data mining itself is ethically
neutral. In many cases, the results of data mining applications
such as association rule or classification rule mining can
compromise the privacy of the data. It is important to note that
the term data mining has no ethical implications. The problem
of privacy-preserving data mining has become more important
in recent years because of the increasing ability to store
personal data about users, and the increasing sophistication of
data mining algorithms to leverage this information. A number
of techniques such as randomization and k-anonymity have
been suggested in recent years in order to perform privacy-
preserving data mining.
A. Randominazation
The randomization technique uses data distortion methods
in order to create private representations of the records. In
most cases, the individual records cannot be recovered, but
only aggregate distributions can be recovered. These
aggregate distributions can be used for data mining purposes.
Two kinds of perturbation are possible with the randomization
Method: Additive Perturbation, Multiplicative
Perturbation.
B. The K-anonymity method
An important method for privacy de-identification is the
method of k-anonymity. The motivating factor behind the k-
anonymity technique is that many attributes in the data can
often be considered pseudo-identifiers which can be used in
conjunction with public records in orderto uniquely identify the
records. For example, if the identifications from the records are
removed, attributes such as the birth date and zip-code be used
in order to uniquely identify the identities of the underlying
International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
7 | 7 1
records. The idea in k-anonymity is to reduce the granularity of
representation of the data in such a way that a given record
cannot be distinguished from at least (k − 1) other records. In
chapter 5, the k-anonymity method is discussed in detail. A
number of important algorithms for k-anonymity are discussed
in the same chapter.
V. APPLICATIONS
Now Data Mining helps in the field of Industry, Visual,
Music, Science and engineering and many more. Here we are
going to discuss about the most efficient uses or application of
Data mining in the described fields.
Fig.3 Purpose Of Data Mining
A. Industry
Data mining has been used extensively in the banking and
financial markets. In the banking industry, data mining is
heavily used to model and predict credit fraud, to evaluate
risk,to perform trend analysis, and to analyze profitability, as
well as to help with direct marketing campaigns. In the
financial markets, neural networks have been used in stock-
price forecasting,
In option trading, in bond rating, in portfolio management,
in commodity price prediction, in mergers and acquisitions, as
well as in forecasting financial disasters.
Slim margins have pushed retailers into embracing Data
Mining earlier then other industry. Retailers have seen
improved decision-support processes lead directly to improved
efficiency in inventory management and financial forecasting.
Large retail chains and grocery stores store vast amounts of
point-of-sale data that is information rich. One application of
data mining in real estate is the AREAS Property Valuation
product from HNC (Higher National Certificate) Software,
which performs property valuation. Data mining has been used
extensively in the medical industry already. For example,
NeuroMedical Systems used neural networks to perform a pap
smear diagnostic aid. Vysis uses neural networks to perform
protein analysis for drug development.
Fig.4 Customers Details In Bank
B. Science And Engineering
In recent years, data mining has been used widely in the
areas of science and engineering, such as bioinformatics,
genetics, medicine, education and electrical power
engineering. In the study of human genetics, sequence mining
helps address the important goal of understanding the mapping
relationship between the inter-individual variation in human
DNA sequences and variability in disease susceptibility. In the
area of electrical power engineering, data mining methods
have been widely used for condition monitoring of high
voltage electrical equipment. The purpose of condition
monitoring is to obtain valuable information on the
insulation's health status of the equipment. Data clustering
such as self-organizing map (SOM) has been applied on the
vibration monitoring and analysis of transformer on-load tap-
changers (OLTCS). Using vibration monitoring, it can be
observed that each tap change operation generates a signal that
contains information about the condition of the tap changer
contacts and the drive mechanisms. Data mining in
science/engineering is within educational research, where data
mining has been used to study the factors leading students to
choose to engage in behaviors which reduce their learning and
to understand the factors influencing university student
retention. A similar example of the social application of data
mining is its use in expertise finding systems, whereby
descriptors of human expertise are extracted, normalized and
classified so as to facilitate the finding of experts, particularly
in scientific and technical fields. In this way, data mining can
facilitate Institutional memory.
C. Visual
There is a large number of visualization techniques which
can be used for visualizing the data. In addition to standard
2D/3D-techniques such as x-y (x-y-z) plots, bar charts, line
graphs, etc., there are a number of more sophisticated
visualization techniques. These techniques are useful for data
exploration but are limited to relatively small and low-
International Journal of Conceptions on Computing & Information Technology
Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808
8 | 7 1
dimensional data sets. In the last decade, a large number of
novel information visualization techniques have been
developed, allowing visualizations of multidimensional data
sets without inherent two- or three-dimensional semantics.
D. Music
Data mining techniques and in particular co-occurrence
analysis has been used be used to discover relevant similarities
among music corpora (radio list, CD databases) for the purpose
of classifying music into genres in an objective manner.
VI. CONCLUSION
In this paper we have given an overview of Data Mining
and its application. We are not describing the all aspects of
Data mining, only the process uses and its some application in
different fields or areas. Data mining is a tool that is used by
governments and corporations to predict and establish trends
with specific purposes in mind. Corporations use data mining
to examine buying patterns and predict future trends.
REFERENCES
[1] Data Mining Concept And Techniques By J Han & Michelin Kamber
[2] Intoducing to Data Mining By M Steinbac & V Kumar
[3] Discovering Data Mining From Concept to Implementation By P
Hadjnian, J Verhees & R Sadler
[4] Wikipedia The Free Encyclopedia- www.wikipedia.com
[5] Britannica The Encyclopedia- www.britanica.com
[6] Data Mining And Privacy- www.thearling.com
[7] Data Mining Software- www.emanio.com

More Related Content

What's hot

Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningPradnya Saval
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 

What's hot (20)

Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Text mining
Text miningText mining
Text mining
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining
Data MiningData Mining
Data Mining
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data mining
Data miningData mining
Data mining
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 

Similar to Fundamentals of data mining and its applications

A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET Journal
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGCarrie Romero
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET Journal
 
Overview of Data Mining
Overview of Data MiningOverview of Data Mining
Overview of Data Miningijtsrd
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSijistjournal
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmIRJET Journal
 
1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docx1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docxbraycarissa250
 

Similar to Fundamentals of data mining and its applications (20)

Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in Software
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMINING
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
 
Overview of Data Mining
Overview of Data MiningOverview of Data Mining
Overview of Data Mining
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Big data upload
Big data uploadBig data upload
Big data upload
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
 
1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docx1. Web Mining – Web mining is an application of data mining for di.docx
1. Web Mining – Web mining is an application of data mining for di.docx
 

Fundamentals of data mining and its applications

  • 1. International Journal of Conceptions on Computing & Information Technology Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808 5 | 7 1 Fundamentals of data mining and its applications Sourav Sarangi and Subrat Swain Dept. of Biotechnology, MITS Engineering College, Rayagada, Odisha sourav@sierraairtraffic.com and inbredstar@gmail.com Abstract— This paper focuses on Data mining and its uses in our life or environment. Simply we can say Data mining is the essential process where intelligent methods are applied to extract data. It is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems. The overall goal of the data mining process is to extract knowledge from a data set in a human-understandable structure and besides the raw analysis step involves database. The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records, unusual records and dependencies. The data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Data mining have many application or uses. Now these days it is used in the field of Business, Science, Visual, Music, Telecommunication and many more. So here we are going to discuss about Data Mining and its gift or application for human being. Keywords- Process, Software, Privacy Concerns & Ethics, Applications I. INTRODUCTION Data mining techniques are the result of a long process of research and product development. Data mining takes evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining, the extraction of hidden predictive information from large databases. Data mining derives its name from the similarities between searching for valuable business information in a large database. It is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems. Data mining is a promising and relatively new technology that is defined as a process of discovering hidden valuable and useful knowledge or information by analyzing large amounts of data storing in databases or data warehouse using different techniques such as machine learning, artificial intelligence(AI) and statistical. Data mining is an iterative process that typically involves the following phases: 1) Problem Definition 2) Data Exploration 3) Data Preparation 4) Modeling 5) Evaluation 6) Deployment A. Problem Definition A data mining project starts with the understanding of the business problem. Data mining experts, business experts, and domain experts work closely together to define the project objectives and the requirements from a business perspective. The project objective is then translated into a data mining problem definition. II. PROCESS Fig.2 Data mining Process A. Data Exploration Domain experts understand the meaning of the metadata. They collect, describe, and explore the data. Data exploration is a common process in data warehouses which are characterized by large bulks of data coming from disparate systems. Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved.
  • 2. International Journal of Conceptions on Computing & Information Technology Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808 6 | 7 1 B. Data Prepararion The data preparation normally consumes about 90% of the time. The outcome of the data preparation phase is the final data set. Once data sources available are identified, they need to be selected, cleaned, constructed and formatted into the desired form. Data preparation means manipulation of data into a form suitable for further analysis and processing. It is a process that involves many different tasks and which cannot be fully automated. Many of the data preparation activities are routine, tedious, and time consuming. It has been estimated that data preparation accounts for 60%-80% of the time spent on a data mining project. C. Modeling Modeling techniques have to be selected to be used for the prepared dataset. In the modeling phase, a frequent exchange with the domain experts from the data preparation phase is required. The modeling phase and the evaluation phase are coupled. They can be repeated several times to change parameters until optimal values are achieved. D. Evaluation The Modeling and Evaluation process are inter related. Data mining experts evaluate the model. If the model does not satisfy their expectations, they go back to the modeling phase and rebuild the model by changing its parameters until optimal values are achieved. In this phase, new business requirements may be raised due to new patterns has been discovered in the model results or from other factors. E. Deployment The knowledge or information that gain through data mining process needs to be presented in such a way that stakeholders can use it when they want it. The deployment term says that it is the application of a model for prediction or classification to new data. After a satisfactory model or set of models has been identified for a particular application, we usually want to deploy those models so that predictions or predicted classifications can quickly be obtained for new data. For example, a credit card company may want to deploy a trained model or set of models to quickly identify transactions which have a high probability of being fraudulent. III. SOFTWARE Data mining software is a fairly new phrase that refers for the procedure by which predictive styles are taken out from info. Data mining software describes a set of tools used for the purpose of analyzing vast amounts of data in order to discover and understand specific patterns. Data mining software originated in the scientific community where it was used to discern patterns from data related to scientific studies. Data mining software quickly found a strong foothold in the business community as large businesses began to amass vast amounts of data. Now here some Data mining softwares which are recently invented or before: 1) Carrot2 – Text and search results clustering framework. 2) Chemicalize.org – A chemical structure miner and web search engine. 3) GATE – Natural language processing and language engineering tool. 4) KNIME – The Konstanz Information Miner, a user friendly and comprehensive data analytics framework. 5) Orange – A component-based data mining and machine learning software suite written in the Python language. 6) UIMA – The UIMAMENT(UNSTRUCTURED INFORMATION MANAGE) is a component framework for analyzing unstructured content such as text, audio and video, originally developed by IBM. 7) JHep Work– Java cross-platform data analysis framework developed at ANL. Weka– A suite of machine learning software written in the Java language. IV. PRIVACY CONCERNS & ETHICS Privacy is a loaded issue. In data mining, the privacy and legal issues that may ensue are key to the conflict. In recent years privacy concerns have taken on a more significant role in American society as merchants, insurance companies, and government agencies amass warehouses containing personal data. Some people believe that data mining itself is ethically neutral. In many cases, the results of data mining applications such as association rule or classification rule mining can compromise the privacy of the data. It is important to note that the term data mining has no ethical implications. The problem of privacy-preserving data mining has become more important in recent years because of the increasing ability to store personal data about users, and the increasing sophistication of data mining algorithms to leverage this information. A number of techniques such as randomization and k-anonymity have been suggested in recent years in order to perform privacy- preserving data mining. A. Randominazation The randomization technique uses data distortion methods in order to create private representations of the records. In most cases, the individual records cannot be recovered, but only aggregate distributions can be recovered. These aggregate distributions can be used for data mining purposes. Two kinds of perturbation are possible with the randomization Method: Additive Perturbation, Multiplicative Perturbation. B. The K-anonymity method An important method for privacy de-identification is the method of k-anonymity. The motivating factor behind the k- anonymity technique is that many attributes in the data can often be considered pseudo-identifiers which can be used in conjunction with public records in orderto uniquely identify the records. For example, if the identifications from the records are removed, attributes such as the birth date and zip-code be used in order to uniquely identify the identities of the underlying
  • 3. International Journal of Conceptions on Computing & Information Technology Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808 7 | 7 1 records. The idea in k-anonymity is to reduce the granularity of representation of the data in such a way that a given record cannot be distinguished from at least (k − 1) other records. In chapter 5, the k-anonymity method is discussed in detail. A number of important algorithms for k-anonymity are discussed in the same chapter. V. APPLICATIONS Now Data Mining helps in the field of Industry, Visual, Music, Science and engineering and many more. Here we are going to discuss about the most efficient uses or application of Data mining in the described fields. Fig.3 Purpose Of Data Mining A. Industry Data mining has been used extensively in the banking and financial markets. In the banking industry, data mining is heavily used to model and predict credit fraud, to evaluate risk,to perform trend analysis, and to analyze profitability, as well as to help with direct marketing campaigns. In the financial markets, neural networks have been used in stock- price forecasting, In option trading, in bond rating, in portfolio management, in commodity price prediction, in mergers and acquisitions, as well as in forecasting financial disasters. Slim margins have pushed retailers into embracing Data Mining earlier then other industry. Retailers have seen improved decision-support processes lead directly to improved efficiency in inventory management and financial forecasting. Large retail chains and grocery stores store vast amounts of point-of-sale data that is information rich. One application of data mining in real estate is the AREAS Property Valuation product from HNC (Higher National Certificate) Software, which performs property valuation. Data mining has been used extensively in the medical industry already. For example, NeuroMedical Systems used neural networks to perform a pap smear diagnostic aid. Vysis uses neural networks to perform protein analysis for drug development. Fig.4 Customers Details In Bank B. Science And Engineering In recent years, data mining has been used widely in the areas of science and engineering, such as bioinformatics, genetics, medicine, education and electrical power engineering. In the study of human genetics, sequence mining helps address the important goal of understanding the mapping relationship between the inter-individual variation in human DNA sequences and variability in disease susceptibility. In the area of electrical power engineering, data mining methods have been widely used for condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to obtain valuable information on the insulation's health status of the equipment. Data clustering such as self-organizing map (SOM) has been applied on the vibration monitoring and analysis of transformer on-load tap- changers (OLTCS). Using vibration monitoring, it can be observed that each tap change operation generates a signal that contains information about the condition of the tap changer contacts and the drive mechanisms. Data mining in science/engineering is within educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors which reduce their learning and to understand the factors influencing university student retention. A similar example of the social application of data mining is its use in expertise finding systems, whereby descriptors of human expertise are extracted, normalized and classified so as to facilitate the finding of experts, particularly in scientific and technical fields. In this way, data mining can facilitate Institutional memory. C. Visual There is a large number of visualization techniques which can be used for visualizing the data. In addition to standard 2D/3D-techniques such as x-y (x-y-z) plots, bar charts, line graphs, etc., there are a number of more sophisticated visualization techniques. These techniques are useful for data exploration but are limited to relatively small and low-
  • 4. International Journal of Conceptions on Computing & Information Technology Vol. 1, Issue. 1, November 2013; ISSN: 2345 - 9808 8 | 7 1 dimensional data sets. In the last decade, a large number of novel information visualization techniques have been developed, allowing visualizations of multidimensional data sets without inherent two- or three-dimensional semantics. D. Music Data mining techniques and in particular co-occurrence analysis has been used be used to discover relevant similarities among music corpora (radio list, CD databases) for the purpose of classifying music into genres in an objective manner. VI. CONCLUSION In this paper we have given an overview of Data Mining and its application. We are not describing the all aspects of Data mining, only the process uses and its some application in different fields or areas. Data mining is a tool that is used by governments and corporations to predict and establish trends with specific purposes in mind. Corporations use data mining to examine buying patterns and predict future trends. REFERENCES [1] Data Mining Concept And Techniques By J Han & Michelin Kamber [2] Intoducing to Data Mining By M Steinbac & V Kumar [3] Discovering Data Mining From Concept to Implementation By P Hadjnian, J Verhees & R Sadler [4] Wikipedia The Free Encyclopedia- www.wikipedia.com [5] Britannica The Encyclopedia- www.britanica.com [6] Data Mining And Privacy- www.thearling.com [7] Data Mining Software- www.emanio.com