A REVIEW ON DATA MINING

1   Er.Nancy, perusing M-tech in CSE (2012-14), GNDEC,
                      Ludhiana, India.
ABSTRACT

   Data Mining is an approach to discover or extract
    knowledge from Databases.
   Generally to store databases, enterprises make data
    warehouses and data marts.
   Data warehouses and data marts contain large
    mountains of data.
   The mountains represent valuable resource to the
    enterprise.
   Due to extracting knowledge from large data
    warehouses or depositories, data mining plays great
    role in various fields of machine learning.
   Data mining is used in education, medical, scientific,
    business fields. Various algorithms and programs are
                                                             2
    used for data mining approach.[1]
INTRODUCTION
   Now the world of information technology, all things are
    become automated.
   Information technology is used in every field of human
    life such as business, engineering, medical,
    mathematical, scientific.
   All these fields of human life have lead to the large
    volume of data storage in various formats such as
    records, files, documents, images, sounds, recordings,
    videos and many new data.
   Collection of related data is also known as database. To
    extract correct data from large databases, the proper
    mechanism is used, that mechanism is also known as
    DATA MINING. It is a knowledge discovery technique
    (KDD).[2]                                                  3
PROCESS
 (1) Selection
 (2) Pre-processing

 (3) Transformation

 (4) Data Mining

 (5) Interpretation/Evaluation.[2,3]




                                        4
5
PROCEDURE OF DATA MINING
   1. Data cleaning: It is also known as data cleansing; in
    this phase noise data and irrelevant data are removed
    from the collection.

   2. Data integration: In this stage, multiple data sources,
    often heterogeneous, are combined in a common
    source.

   3. Data selection: The data relevant to the analysis is
    decided on and retrieved from the data collection.

   4. Data transformation: It is also known as data
    consolidation; in this phase the selected data is
    transformed into forms appropriate for the mining
    procedure.                                                   6
CONTD…….
   5. Data mining: It is the crucial step in which clever
    techniques are applied to extract potentially useful patterns.


   6. Pattern evaluation: In this step, interesting patterns
    representing knowledge are identified based on given
    measures.

   7. Knowledge representation: It is the final phase in
    which the discovered knowledge is visually presented to
    the user. This essential step uses visualization
    techniques to help users understand and interpret the
    data mining results.[4,2]
                                                                     7
DATA MINING LIFE CYCLE

   The Cross Industry Standard Process for Data
    Mining (CRISP-DM) which defines six phases:

 (1) Business Understanding
 (2) Data Understanding

 (3) Data Preparation

 (4) Modeling

 (5) Evaluation

 (6) Deployment.[6]


                                                   8
CONTD……
   1. Business Understanding: This phase focuses on
    understanding the project objectives and requirements
    from a business perspective, then converting this
    knowledge into a data mining problem definition and a
    preliminary plan designed to achieve the objectives.

   2. Data Understanding: It starts with an initial data
    collection, to get familiar with the data, to identify data
    quality problems, to discover first insights into the data
    or to detect interesting subsets to form hypotheses for
    hidden information.

   3. Data Preparation: It covers all activities to construct
    the final dataset from the initial raw data.
                                                                  9
CONTD……
   4. Modeling: In this phase, various modeling techniques
    are selected and applied and their parameters are
    calibrated to optimal values.

   5. Evaluation: In this stage the model is thoroughly
    evaluated and reviewed. The steps executed to
    construct the model to be certain it properly achieves
    the business objectives.

   6. Deployment: The purpose of the model is to increase
    knowledge of the data, the knowledge gained will need
    to be organized and presented in a way that the
                                                              10
    customer can use it.[6,8]
TYPES OF DATA MINING SYSTEM
   Classification of data mining systems according to the
    type of data source mined: This classification is according to
    the type of data handled such as spatial data, multimedia
    Data, time-series data, text data, World Wide Web, etc.

    Classification of data mining systems according to the
    data model: This classification based on the data model
    involved such as relational database, object-oriented
    database, Data warehouse, transactional database, etc.

   Classification of data mining systems according to the
    kind of knowledge discovered: This classification based on
    the kind of knowledge discovered or data mining
    functionalities, such as characterization, discrimination,
    association, classification, clustering, etc
                                                                     11
CONTD……
   Classification of data mining systems according to
    mining techniques used: This classification is
    according to the data analysis approach used such as
    machine learning, Neural networks, genetic algorithms,
    statistics, visualization, database oriented or data
    warehouse-oriented, etc.[5,2]




                                                             12
DATA MINING METHODS

   On-Line Analytical             Factor analysis
    Processing (OLAP)              Neural Networks
   Classification                 Regression analysis
   Association Rule Mining        Structured data analysis
   Temporal Data Mining           Sequence mining
   Time Series Analysis           Text mining
   Spatial Mining,                Drug Discovery
   Anomaly/outlier/change         Exploratory data analysis
    detection
                                   Predictive analytics
   Association rule learning
                                   Web Mining
   Cluster analysis
                                   Data analysis etc.[1,8]     13
   Decision trees
DATA MINING APPLICATIONS

   Games
   Business
   Science and Engineering
   Human rights
   Spatial data mining.[5]




                                        14
CHALLENGES

 Sensor data mining
 Pattern mining

 Visual data mining

 Subject based data mining

 Music data mining

 The Digital Library retrieves.[1,2,5]




                                          15
CONCLUSION

   In this paper I briefly reviewed data mining, background
    of data mining, methods, life cycle model, and types of
    data mining, application of it.
   This review will be helpful to you to easily understand
    what data mining is, previously used data mining
    techniques.
   Now days use data mining techniques and also process
    of data mining.
   In this I completely, explain applications and types of
    data mining. [1,2,4,5]


                                                               16
FUTURE SCOPE

   Data mining is a very wide concept. It contains many
    concepts such as various algorithms to extract
    knowledge from large databases.
   Soft Computing techniques like Fuzzy logic, Neural
    Networks and Genetic Programming which contains
    Complex data objects Includes high dimensional, high
    speed data streams, sequence, noise in the time series,
    graph, Multi instance objects, Multi represented objects
    and temporal data etc.
   These are used in Business, Web, Medical diagnosis,
    Scientific and Research analysis fields (bio, remote
    sensing etc…), Social networking etc.[1,7]
                                                               17
REFERENCES
   1. WIKIPEDIA.Com

   2. Mr. S. P. Deshpande1 and Dr. V. M. Thakar
    International Journal of Distributed and Parallel systems
    (IJDPS) Vol.1, No.1, September 2010 DOI.

    3. A Review on Data mining from Past to the Future.
    Venkatadri.M Research Scholar, Dept. of Computer
    Science, Dravidian University, India. Lokanatha C.
    Reddy Professor, Dept. of Computer Science,
    International Journal of Computer Applications (0975 –
    8887) Volume 15– No.7, February 2011.
                                                                18
CONTD…..
   4. Data Mining for High Performance Data Cloud using
    Association Rule Mining.1 T.V.Mahendra 2N.Deepika
    3N.Keasava Rao Professor & HOD, IT, Narayana Engg.
    College, Nellore, AP, India.2 Sr.Assistant Professor,
    Dept. of ISE, New Horizon College of Engineering,
    Bangalore, India 3 Associate Professor, IT, Narayana
    Engg. College, Gudur, AP, India.


   5. Data Mining and KDD: A Shifting Mosaic By Joseph
    M. Firestone, Ph.D. White Paper No. Two March 12,
    1997 the Idea.

                                                            19
CONTD….
   6. Data mining and static’s: what’s the connection?
    Jerome h. Friedman.

   7.www.Google.Com.

   8.www. IEEEXPLORE.Com.




                                                          20
THANKS
21

A review on data mining

  • 1.
    A REVIEW ONDATA MINING 1 Er.Nancy, perusing M-tech in CSE (2012-14), GNDEC, Ludhiana, India.
  • 2.
    ABSTRACT  Data Mining is an approach to discover or extract knowledge from Databases.  Generally to store databases, enterprises make data warehouses and data marts.  Data warehouses and data marts contain large mountains of data.  The mountains represent valuable resource to the enterprise.  Due to extracting knowledge from large data warehouses or depositories, data mining plays great role in various fields of machine learning.  Data mining is used in education, medical, scientific, business fields. Various algorithms and programs are 2 used for data mining approach.[1]
  • 3.
    INTRODUCTION  Now the world of information technology, all things are become automated.  Information technology is used in every field of human life such as business, engineering, medical, mathematical, scientific.  All these fields of human life have lead to the large volume of data storage in various formats such as records, files, documents, images, sounds, recordings, videos and many new data.  Collection of related data is also known as database. To extract correct data from large databases, the proper mechanism is used, that mechanism is also known as DATA MINING. It is a knowledge discovery technique (KDD).[2] 3
  • 4.
    PROCESS  (1) Selection (2) Pre-processing  (3) Transformation  (4) Data Mining  (5) Interpretation/Evaluation.[2,3] 4
  • 5.
  • 6.
    PROCEDURE OF DATAMINING  1. Data cleaning: It is also known as data cleansing; in this phase noise data and irrelevant data are removed from the collection.  2. Data integration: In this stage, multiple data sources, often heterogeneous, are combined in a common source.  3. Data selection: The data relevant to the analysis is decided on and retrieved from the data collection.  4. Data transformation: It is also known as data consolidation; in this phase the selected data is transformed into forms appropriate for the mining procedure. 6
  • 7.
    CONTD…….  5. Data mining: It is the crucial step in which clever techniques are applied to extract potentially useful patterns.  6. Pattern evaluation: In this step, interesting patterns representing knowledge are identified based on given measures.  7. Knowledge representation: It is the final phase in which the discovered knowledge is visually presented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results.[4,2] 7
  • 8.
    DATA MINING LIFECYCLE  The Cross Industry Standard Process for Data Mining (CRISP-DM) which defines six phases:  (1) Business Understanding  (2) Data Understanding  (3) Data Preparation  (4) Modeling  (5) Evaluation  (6) Deployment.[6] 8
  • 9.
    CONTD……  1. Business Understanding: This phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.  2. Data Understanding: It starts with an initial data collection, to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypotheses for hidden information.  3. Data Preparation: It covers all activities to construct the final dataset from the initial raw data. 9
  • 10.
    CONTD……  4. Modeling: In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values.  5. Evaluation: In this stage the model is thoroughly evaluated and reviewed. The steps executed to construct the model to be certain it properly achieves the business objectives.  6. Deployment: The purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the 10 customer can use it.[6,8]
  • 11.
    TYPES OF DATAMINING SYSTEM  Classification of data mining systems according to the type of data source mined: This classification is according to the type of data handled such as spatial data, multimedia Data, time-series data, text data, World Wide Web, etc.  Classification of data mining systems according to the data model: This classification based on the data model involved such as relational database, object-oriented database, Data warehouse, transactional database, etc.  Classification of data mining systems according to the kind of knowledge discovered: This classification based on the kind of knowledge discovered or data mining functionalities, such as characterization, discrimination, association, classification, clustering, etc 11
  • 12.
    CONTD……  Classification of data mining systems according to mining techniques used: This classification is according to the data analysis approach used such as machine learning, Neural networks, genetic algorithms, statistics, visualization, database oriented or data warehouse-oriented, etc.[5,2] 12
  • 13.
    DATA MINING METHODS  On-Line Analytical  Factor analysis Processing (OLAP)  Neural Networks  Classification  Regression analysis  Association Rule Mining  Structured data analysis  Temporal Data Mining  Sequence mining  Time Series Analysis  Text mining  Spatial Mining,  Drug Discovery  Anomaly/outlier/change  Exploratory data analysis detection  Predictive analytics  Association rule learning  Web Mining  Cluster analysis  Data analysis etc.[1,8] 13  Decision trees
  • 14.
    DATA MINING APPLICATIONS  Games  Business  Science and Engineering  Human rights  Spatial data mining.[5] 14
  • 15.
    CHALLENGES  Sensor datamining  Pattern mining  Visual data mining  Subject based data mining  Music data mining  The Digital Library retrieves.[1,2,5] 15
  • 16.
    CONCLUSION  In this paper I briefly reviewed data mining, background of data mining, methods, life cycle model, and types of data mining, application of it.  This review will be helpful to you to easily understand what data mining is, previously used data mining techniques.  Now days use data mining techniques and also process of data mining.  In this I completely, explain applications and types of data mining. [1,2,4,5] 16
  • 17.
    FUTURE SCOPE  Data mining is a very wide concept. It contains many concepts such as various algorithms to extract knowledge from large databases.  Soft Computing techniques like Fuzzy logic, Neural Networks and Genetic Programming which contains Complex data objects Includes high dimensional, high speed data streams, sequence, noise in the time series, graph, Multi instance objects, Multi represented objects and temporal data etc.  These are used in Business, Web, Medical diagnosis, Scientific and Research analysis fields (bio, remote sensing etc…), Social networking etc.[1,7] 17
  • 18.
    REFERENCES  1. WIKIPEDIA.Com  2. Mr. S. P. Deshpande1 and Dr. V. M. Thakar International Journal of Distributed and Parallel systems (IJDPS) Vol.1, No.1, September 2010 DOI.  3. A Review on Data mining from Past to the Future. Venkatadri.M Research Scholar, Dept. of Computer Science, Dravidian University, India. Lokanatha C. Reddy Professor, Dept. of Computer Science, International Journal of Computer Applications (0975 – 8887) Volume 15– No.7, February 2011. 18
  • 19.
    CONTD…..  4. Data Mining for High Performance Data Cloud using Association Rule Mining.1 T.V.Mahendra 2N.Deepika 3N.Keasava Rao Professor & HOD, IT, Narayana Engg. College, Nellore, AP, India.2 Sr.Assistant Professor, Dept. of ISE, New Horizon College of Engineering, Bangalore, India 3 Associate Professor, IT, Narayana Engg. College, Gudur, AP, India.  5. Data Mining and KDD: A Shifting Mosaic By Joseph M. Firestone, Ph.D. White Paper No. Two March 12, 1997 the Idea. 19
  • 20.
    CONTD….  6. Data mining and static’s: what’s the connection? Jerome h. Friedman.  7.www.Google.Com.  8.www. IEEEXPLORE.Com. 20
  • 21.