Data Mining
Presented by: Pralhad Rijal
What is data mining ?
• Is the discovery of knowledge form data. (Extraction of data or
knowledge from the huge amount of data or the data warehouses)
• Also known as KDD. (knowledge discovery in data base.
• The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
Why data mining?
• Market analysis and management
Target marketing, customer relationship management (CRM), market basket analysis, market segmentation
• Risk analysis and management
Forecasting, customer retention, quality control, competitive analysis
• Detection of unusual patterns
• Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover
customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site
organization, etc.
KDD(Knowledge discovery in
Databases)
Steps in KDD
• Data cleaning: it is a phase in which noise data and irrelevant data are removed from the collection.
• Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a
common source.
• Data selection: the data relevant to the analysis is decided on and retrieved from the data collection.
• Data transformation: also known as data consolidation, it is a phase in which the selected data is
transformed into forms appropriate for the mining procedure.eg. compression
• Data mining: it is the crucial step in which clever techniques are applied to extract patterns
potentially useful. Identifies the data that is required by the user.
• Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified
based on given measures.
• Knowledge representation: is the final phase in which the discovered knowledge is visually
represented to the user. This essential step uses visualization techniques to help users understand and
interpret the data mining results.
Association Rules
• Association rules are if/then statements that help uncover relationships
between apparently unrelated data in a relational database or other
information repository. An example of an association rule would be “If
a customer buys a dozen of copy, he is 80% likely to also purchase
pen.”
• In data mining association rules are useful for analyzing and predicting
customer behavior. They play an important part in shopping market
basket data analysis.
• Market Basket Analysis is a modelling technique based upon the
theory that if you buy a certain group of items, you are more (or less)
likely to buy another group of items
Classification
• Classification is a data mining function that assigns items in a
collection to target categories or classes.
• The goal of classification is to accurately predict target class or
categories.
• Example: A bank loan officer wants to analyze the data in order to
know which customer (loan applicant) are risky or which are safe.
Clustering
• Clustering is a process of partitioning a set of data (or objects) into a
set of meaningful sub-classes, called clusters
• Clustering analysis is broadly used in many applications such as market
research, pattern recognition, data analysis, and image processing.
• Clustering can also help marketers discover distinct groups in their
customer base. And they can characterize their customer groups based
on the purchasing patterns.
What is Data mining? Data mining Presentation

What is Data mining? Data mining Presentation

  • 1.
  • 3.
    What is datamining ? • Is the discovery of knowledge form data. (Extraction of data or knowledge from the huge amount of data or the data warehouses) • Also known as KDD. (knowledge discovery in data base. • The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 4.
    Why data mining? •Market analysis and management Target marketing, customer relationship management (CRM), market basket analysis, market segmentation • Risk analysis and management Forecasting, customer retention, quality control, competitive analysis • Detection of unusual patterns • Internet Web Surf-Aid IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc.
  • 5.
  • 6.
    Steps in KDD •Data cleaning: it is a phase in which noise data and irrelevant data are removed from the collection. • Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. • Data selection: the data relevant to the analysis is decided on and retrieved from the data collection. • Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure.eg. compression • Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Identifies the data that is required by the user. • Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. • Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results.
  • 7.
    Association Rules • Associationrules are if/then statements that help uncover relationships between apparently unrelated data in a relational database or other information repository. An example of an association rule would be “If a customer buys a dozen of copy, he is 80% likely to also purchase pen.” • In data mining association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping market basket data analysis. • Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items
  • 8.
    Classification • Classification isa data mining function that assigns items in a collection to target categories or classes. • The goal of classification is to accurately predict target class or categories. • Example: A bank loan officer wants to analyze the data in order to know which customer (loan applicant) are risky or which are safe.
  • 9.
    Clustering • Clustering isa process of partitioning a set of data (or objects) into a set of meaningful sub-classes, called clusters • Clustering analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing. • Clustering can also help marketers discover distinct groups in their customer base. And they can characterize their customer groups based on the purchasing patterns.