Priyabrata satapathy
      M.Tech 1st Year
    SIS NO.-MCS12121
 What is Data mining.

 Why Data mining needed.

 Data, Information, Knowledge.

 Data mining & KDD.

 Data Warehouses.

 Data Cleaning.

 Applications of Data mining.
Data mining (knowledge discovery in databases):

    Extraction of interesting information or patterns from data
in large databases.


    Knowledge discovery in databases (KDD) is the process of
identifying valid, useful and ultimately understandable patterns
in data from large database.
 Data mining is needed for providing tools to
  discover Knowledge from data.

 Data mining turns a large collection of data
  into knowledge.
•Data
  Data are any facts, numbers, or text that can be
  processed by a computer.
      operational or transactional data : such as, sales, cost,
     inventory, payroll, and accounting

      meta data - : data about the data itself, such as logical
     database design or data dictionary definitions

      nonoperational data: such as industry sales, forecast
     data, and macro economic data
The patterns, associations, or relationships among
All this data can provide information.

    For example, analysis of retail point of sale
    transaction data can yield information on which
    products are selling and when.
•Information can  be converted into knowledge
about historical patterns and future trends.

    For example, summary information on retail
    supermarket sales can be analyzed in light of
    promotional efforts to provide knowledge of
    consumer buying behavior.
Data cleaning
Used to remove noise and inconsistent data.
Data integration
Where multiple data sources may be combined.
Data selection
Where data relevant to the analysis task are retrieved from
the database.
Data transformation
Where data are transformed or consolidated into forms
appropriate for mining by performing summary.
Data mining
An essential process where intelligent methods are applied
in order to extract data patterns.
Data Mining & KDD
IA data warehouse is a repository of information
collected from multiple sources, stored under a
unified schema and residing to a single site.

Data warehouse constructed through a process of
data cleaning, data integration, data transformation,
data loading & data refreshing.
Data that is to be analyze by data mining
techniques can be incomplete, noisy, and
inconsistent.



Data cleaning routines attempt to fill in missing
values, smooth out noise while identifying
outliers, and correct inconstancies of data.
We can clean the missing values in data by
 Ignoring the tuple.

 Filling the missing value manually.

 Using a global constant to fill the values.

 Using the measure of mean, median to fill the
missing value.
 Using the most probable value to fill.
Noisy data means error full data .
To handle noisy data :
 Binning:

Binning methods smooth a sorted data value by
consulting the neighborhood values around it.
 Regression:

 Data smoothing can be done by regression . Here
data values changes to a function.
 Outlier:

 Outliers may be detected by clustering. Here
similar values are arranged in clusters, those are fall
outside are outliers.
Data mining for Financial data analysis.

Data mining for Retail and Telecommunication
Industries.

Data mining for Science and Engineering.

Data mining and Recommender systems.
Thank You

Data mining

  • 1.
    Priyabrata satapathy M.Tech 1st Year SIS NO.-MCS12121
  • 2.
     What isData mining.  Why Data mining needed.  Data, Information, Knowledge.  Data mining & KDD.  Data Warehouses.  Data Cleaning.  Applications of Data mining.
  • 3.
    Data mining (knowledgediscovery in databases):  Extraction of interesting information or patterns from data in large databases.  Knowledge discovery in databases (KDD) is the process of identifying valid, useful and ultimately understandable patterns in data from large database.
  • 4.
     Data miningis needed for providing tools to discover Knowledge from data.  Data mining turns a large collection of data into knowledge.
  • 5.
    •Data Dataare any facts, numbers, or text that can be processed by a computer.  operational or transactional data : such as, sales, cost, inventory, payroll, and accounting  meta data - : data about the data itself, such as logical database design or data dictionary definitions  nonoperational data: such as industry sales, forecast data, and macro economic data
  • 6.
    The patterns, associations,or relationships among All this data can provide information.  For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.
  • 7.
    •Information can be converted into knowledge about historical patterns and future trends.  For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior.
  • 9.
    Data cleaning Used toremove noise and inconsistent data. Data integration Where multiple data sources may be combined. Data selection Where data relevant to the analysis task are retrieved from the database. Data transformation Where data are transformed or consolidated into forms appropriate for mining by performing summary. Data mining An essential process where intelligent methods are applied in order to extract data patterns.
  • 10.
  • 11.
    IA data warehouseis a repository of information collected from multiple sources, stored under a unified schema and residing to a single site. Data warehouse constructed through a process of data cleaning, data integration, data transformation, data loading & data refreshing.
  • 12.
    Data that isto be analyze by data mining techniques can be incomplete, noisy, and inconsistent. Data cleaning routines attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconstancies of data.
  • 13.
    We can cleanthe missing values in data by  Ignoring the tuple.  Filling the missing value manually.  Using a global constant to fill the values.  Using the measure of mean, median to fill the missing value.  Using the most probable value to fill.
  • 14.
    Noisy data meanserror full data . To handle noisy data :  Binning: Binning methods smooth a sorted data value by consulting the neighborhood values around it.  Regression: Data smoothing can be done by regression . Here data values changes to a function.  Outlier: Outliers may be detected by clustering. Here similar values are arranged in clusters, those are fall outside are outliers.
  • 15.
    Data mining forFinancial data analysis. Data mining for Retail and Telecommunication Industries. Data mining for Science and Engineering. Data mining and Recommender systems.
  • 16.