KDD Process
By
Nithin N
KDD PROCESS
 KDD (Knowledge Discovery in Databases) is a process that
involves the extraction of useful, previously unknown, and
potentially valuable information from large datasets. The
KDD process is an iterative process and it requires multiple
iterations of the above steps to extract accurate knowledge
from the data.
November 12, 2024
Knowledge Discovery (KDD) Process
 Data mining—core of knowledge
discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
DATA CLEANING
 Remove Noise and Inconsistent Data
DATA INTEGRATION
 Where multiple data sources may be combined
DATA SELECTION
 Where data relevant to the analysis task are retrieved from the data
base
DATA TRANSFORMATION
 Where data are transformed and consolidated into forms
appropriate for mining by performing summary or aggregation
operation
Data Mining
 An essential Process where intelligent methods are applied to
extract data patterns
PATTERN EVALUATION
 To identify the truly interesting patterns representing knowledge
based on interestingness measures
KNOWLEDGE REPRESENTATION
 Where visualization and knowledge representation techniques are
used to present mined knowledge to users
 Note: KDD is an iterative process where evaluation
measures can be enhanced, mining can be refined, new
data can be integrated and transformed in order to get
different and more appropriate results.Preprocessing of
databases consists of Data cleaning and Data
Integration.
WHAT KINDS OF DATA CAN BE MINED
 DATABASE DATA
 Data warehouses
 Transactional data
 Other Kinds of Data
DATABASE DATA
 DBMS
 Relational Database Examples
 Attributes
 Tuples
Data warehouses
Data Cube
Transactional Data
 A Transactional data must be unique
Other Kinds of Data
 Time Related Data
 Sequence Data(Historical Records,Stock Exchange)
 Data streams( Video Surveillance, Sensor Data)
 Spatial Data(maps)
 Hyper Text and Multimedia Data(Text,Video,Audio)
 Graph and Networked Data
 Engineering Design Data(auto CAD)
 Web
Advantages of KDD Process
1. Improves decision-making: KDD provides valuable insights and
knowledge that can help organizations make better decisions.
2. Increased efficiency: KDD automates repetitive and time-consuming
tasks and makes the data ready for analysis, which saves time and
money.
3. Better customer service: KDD helps organizations gain a better
understanding of their customers’ needs and preferences, which can
help them provide better customer service.
4. Fraud detection: KDD can be used to detect fraudulent activities by
identifying patterns and anomalies in the data that may indicate fraud.
5. Predictive modeling: KDD can be used to build predictive models that
can forecast future trends and patterns.
Disadvantages of KDD Process
1. Privacy concerns: KDD can raise privacy concerns as it involves collecting
and analyzing large amounts of data, which can include sensitive
information about individuals.
2. Complexity: KDD can be a complex process that requires specialized skills
and knowledge to implement and interpret the results.
3. Unintended consequences: KDD can lead to unintended consequences,
such as bias or discrimination, if the data or models are not properly
understood or used.
4. Data Quality: KDD process heavily depends on the quality of data, if data is
not accurate or consistent, the results can be misleading
5. High cost: KDD can be an expensive process, requiring significant
investments in hardware, software, and personnel.
6. Overfitting: KDD process can lead to overfitting, which is a common
problem in machine learning where a model learns the detail and noise in
the training data to the extent that it negatively impacts the performance of
the model on new unseen data.
Difference Between KDD and Data Mining
THANK YOU

kddprocess-[1].pptx DAta Mining Seminar KDD process

  • 1.
  • 2.
    KDD PROCESS  KDD(Knowledge Discovery in Databases) is a process that involves the extraction of useful, previously unknown, and potentially valuable information from large datasets. The KDD process is an iterative process and it requires multiple iterations of the above steps to extract accurate knowledge from the data.
  • 3.
    November 12, 2024 KnowledgeDiscovery (KDD) Process  Data mining—core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation
  • 4.
    DATA CLEANING  RemoveNoise and Inconsistent Data
  • 5.
    DATA INTEGRATION  Wheremultiple data sources may be combined
  • 6.
    DATA SELECTION  Wheredata relevant to the analysis task are retrieved from the data base
  • 7.
    DATA TRANSFORMATION  Wheredata are transformed and consolidated into forms appropriate for mining by performing summary or aggregation operation
  • 8.
    Data Mining  Anessential Process where intelligent methods are applied to extract data patterns
  • 9.
    PATTERN EVALUATION  Toidentify the truly interesting patterns representing knowledge based on interestingness measures
  • 10.
    KNOWLEDGE REPRESENTATION  Wherevisualization and knowledge representation techniques are used to present mined knowledge to users
  • 11.
     Note: KDDis an iterative process where evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.Preprocessing of databases consists of Data cleaning and Data Integration.
  • 12.
    WHAT KINDS OFDATA CAN BE MINED  DATABASE DATA  Data warehouses  Transactional data  Other Kinds of Data
  • 13.
    DATABASE DATA  DBMS Relational Database Examples  Attributes  Tuples
  • 14.
  • 15.
  • 16.
    Transactional Data  ATransactional data must be unique
  • 17.
    Other Kinds ofData  Time Related Data  Sequence Data(Historical Records,Stock Exchange)  Data streams( Video Surveillance, Sensor Data)  Spatial Data(maps)  Hyper Text and Multimedia Data(Text,Video,Audio)  Graph and Networked Data  Engineering Design Data(auto CAD)  Web
  • 18.
    Advantages of KDDProcess 1. Improves decision-making: KDD provides valuable insights and knowledge that can help organizations make better decisions. 2. Increased efficiency: KDD automates repetitive and time-consuming tasks and makes the data ready for analysis, which saves time and money. 3. Better customer service: KDD helps organizations gain a better understanding of their customers’ needs and preferences, which can help them provide better customer service. 4. Fraud detection: KDD can be used to detect fraudulent activities by identifying patterns and anomalies in the data that may indicate fraud. 5. Predictive modeling: KDD can be used to build predictive models that can forecast future trends and patterns.
  • 19.
    Disadvantages of KDDProcess 1. Privacy concerns: KDD can raise privacy concerns as it involves collecting and analyzing large amounts of data, which can include sensitive information about individuals. 2. Complexity: KDD can be a complex process that requires specialized skills and knowledge to implement and interpret the results. 3. Unintended consequences: KDD can lead to unintended consequences, such as bias or discrimination, if the data or models are not properly understood or used. 4. Data Quality: KDD process heavily depends on the quality of data, if data is not accurate or consistent, the results can be misleading 5. High cost: KDD can be an expensive process, requiring significant investments in hardware, software, and personnel. 6. Overfitting: KDD process can lead to overfitting, which is a common problem in machine learning where a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new unseen data.
  • 20.
    Difference Between KDDand Data Mining
  • 21.