Mr.T.Somasundaram
Assistant Professor
Department Of Management
Kristu Jayanti College (Autonomous), Bengaluru
UNIT 4
INTRODUCTION TO DATA MINING
Data Mining – Definition, Challenges, tasks, Data pre-
processing, Data Cleaning, missing data, dimensionality
reduction, data transformation, measures of similarity and
dissimilarity, Introduction to Association rules, APRIORI
algorithm, partition algorithm, FP growth algorithm,
Introduction to Classification techniques, Decision tree,
Naïve-Bayes classifier, k-nearest neighbour, classification
algorithm.
Unit 4
introductionto datamining
Data mining - introduction
Meaning:
 Data mining is the art and science of using more powerful
algorithms, than traditional query tools such as SQL to
extract more useful information.
 Data mining is also termed as “Knowledge Discovery in
Databases”, is the extraction of hidden predictive
information from large databases.
 It is a powerful new technology with great potential to help
companies. (E.g.) to focus on the most important
information in their data warehouses.
Data mining - introduction
Meaning:
 Data mining is the core of the KDD process, involves
inferring of algorithm that explore the data, develop model
and discover unknown patterns.
 The model is used for understanding phenomena from the
data, analysis and prediction.
 Data mining is concerned with discovering knowledge.
 It is about uncovering relationship or patterns hidden in
data that can be used to predict outcomes.
Definition:
Data mining is the search of relationships and global
patterns that exist in large databases but are ‘hidden’
among the vast amount of data, such as a relationship
between patient data and their medical diagnosis.
- Marcel Holshemier & Arno Siebes
Data mining is the nontrivial extraction of implicit,
previously unknown, and potentially useful information
from data.
- William J Frawley
Characteristics:
 Relevant data stored in large DB like data warehouse.
 It is like end user & tools to get quick response.
 Data mining has client / server architecture.
 It use parallel processing of data mining due to large
amount of data.
 It easily combined with spread sheets & other end user
software development tool for analyze process.
 It get 5 types of information like association, sequence,
classification, cluster, forecasting.
 It use to find unexpected, valuable results.
NEED OF Data mining
Data mining is needed in organization for following reasons
-
1. Operational:
 It ensure the everyday operation of a business run
smoothly.
 This is done by using information to make changes if
mistakes are found.
 The information gained is to help the business maintain a
level of productivity and proficiency.
2. Decisional:
 Data mining help manager or analyst to make decisions
about the future of company based on historical and real
current data.
 It is used for planning of long term goals as well as short
term changes.
3. Informational:
 It is available to those who need it, it is easily accessible.
4. Specific Uses:
 It can be used as model to ‘anticipate future customer
behaviour, based on historical data of interaction with a
particular company.
data cleaning, integration, and selection
Database or Data Warehouse
Server
Data Mining Engine
Pattern Evaluation
Graphical User Interface
Knowledge-
Base
Database Data
Warehouse
World-Wide
Web
Other Info
Repositories
ARCHITECTURE OF Data mining
The architecture of data mining system may have the
following major components -
1. Database, Data Warehouse or Other information
repository:
 This is one or set of databases, DW or others, in which data
cleaning & integration techniques may be performed on the
data.
2. Database or Data Warehouse Server:
 It is responsible for fetching the relevant data, based on the
user’s data mining request.
3. Knowledge Base:
 This is domain knowledge that is used to guide the search,
evaluate the resulting patterns.
 It include hierarchies, used to organize attributes into
different level of abstraction.
4. Data Mining Engine:
 This is essential to data mining system and ideally consists
of set of functional modules for tasks such as
characterization, association, classification, cluster analysis
and deviation analysis.
5. Pattern Evaluation Module:
 This component typically interact with the data mining
modules to focus the search towards interesting patterns.
 It may use interestingness thresholds to filter out
discovered patterns.
6. Graphical User Interface:
 This module communicates between users and data mining
system allowing user to interact with system by specifying
a data mining query.
 It allows user to browse database and data warehouse
schemas or structures and visualize the patterns.
Advantages:
 Automated Prediction of Trends and Behaviours.
 Automated Discovery of Previously Unknown patterns.
 Databases can be larger in both depth and breadth.
Disadvantages:
 Privacy issues.
 Security issues.
 Misuse of Information / Inaccurate Information.
advantages & disadvantages of data mining
1. Retail / Marketing:
 Identify buying patterns from customers.
 Finding associations among customer demographic details.
 Market basket analysis.
2. Banking:
 Detect patterns of fraudulent credit card use.
 Identify ‘loyal’ customers.
 Predict customers likely to change their credit card affiliation.
 Determine credit card spending by customer groups.
 Find hidden correlations between different financial indicators.
 Identify stock trading rules from historical market data.
APPLICATIONSof data mining
3. Insurance and Health Care:
 Claims analysis (i.e.) which medical procedures are claimed
together.
 Predict which customers will buy new policies.
 Identify behaviour patterns of risky customers.
 Identify fraudulent behaviour.
4. Transportation:
 Determine the distribution schedules among outlets.
 Analyse loading patterns.
5. Medicine:
 Characterize patient behaviour to predict office visits.
 Identify successful medical therapies for different illnesses.
APPLICATIONSof data mining
The tasks in data mining are classified into –
1. Summarisation:
 This is the abstraction or generalisation of data.
 A set of task-relevant data is summarised and abstracted and
gives overview of data, usually with aggregate information.
(E.g.) summarisation can go to different abstraction levels and
viewed from different angles with different kinds of pattern. (i.e.)
calls can be summarised into in state calls, state-to-state calls, calls
to Asia, etc. which can be further summarised into domestic calls
and international calls.
data miningTASKS
2. Classification:
 This derives a function or model, which determines the class of
an object, based on its attributes.
 A set of objects is given as the training set.
 It is constructed by analysing the relationship between attributes
and the classes of the objects in the training set.
 This helps us to develop a better understanding of the classes of
the objects in the database.
(E.g.) Classification model can diagnose a new patient’s disease
based on data like age, ender, weight, BP, etc. and concludes a
patient’s disease from his / her diagnostic data.
data miningTASKS
3. Association:
 Association rules can be useful for marketing, commodity
management, advertising, etc.
 (E.g.) Retail store discover that people tend to buy soft drinks and
potato chips together.
4. Clustering:
 This identifies classes – also called clusters or groups – for a set of
object who classes are unknown.
 The objects are so clustered that the interclass similarities are
maximised and the interclass similarities are minimised.
(E.g.) Bank may clusters its customers into several groups based on the
similarities of age, income and residence and used to describe the
customers suitable products and service.
data miningTASKS
5. Trend Analysis:
 Time series data are records accumulated over time.
(E.g.) Company’s sales, a customer’s credit card transactions and stock
prices are all time series data.
(E.g.) one can estimate this year’s profit for a company based on its
profits last year and estimated annual increasing rate.
 By comparing two or more objects historical changing curves or
tracks, similar and dissimilar trends can be discovered.
(E.g.) Company’s sales and profit figures can be analysed to find the
disagreeing trends.
data miningTASKS
Data goes through a series of steps during pre-processing –
data PREPROCESSING
Data Cleaning
Data Integration
Data Transformation
Data Reduction
Discretisation and Concept
Hierarchy Generation
1. Data Cleaning:
 It is used to determine inaccurate, incomplete or unreasonable data
and improving the quality through correction of detected errors and
omissions.
 It include format checks, completeness checks, reasonable checks,
limit checks, other errors, etc.
 This process result in flagging, documenting and subsequent cheking
and correction of suspect records.
 Validation checks may also involve checking for compliance against
applicable standards, rules and conventions.
 Data cleaning identifying outliers and correct inconsistencies in data.
data PREPROCESSING
Missing Values (Data):
 Missing data items present a problem that can be dealt with in several
ways.
 In most cases, missing attribute value indicate lost information.
(E.g.) Missing value for salary may be taken as an unentered data item, but
it could also indicate an individual who is unemployed.
The following are the possible options for dealing with missing data before
data is presented to a data mining algorithm –
a) Discard records with missing values. (appropriate for small percent)
b) Replace missing values with class mean for real-valued data. (appropriate
for numerical attributes)
c) Replace missing values with values found within other highly similar
instances. (appropriate for both numerical or categorical)
2. Data Integration:
 Data integration is important for companies to develop an enterprise-
wide strategy for integration.
 It provide a single, unified view of the business data scattered across
various systems in organization.
 The technique for data integration is straight forward approach for
merging data from one database to another (E.g.) merging customer
data from CRM database into manufacturing system database.
 It is pre-processing method involves merging of data from different
sources to form a coherent data store like data warehouse.
Some issues in data integration are -
i) Schema Integration and Object Matching:
 This issue is also known as entity identification problem as one
doesn’t know how equivalent real-world entities from multiple data
sources can be matched up.
(E.g.) In two different databases maintained in organization indicates
Emp. ID and Emp. No, where same attribute used in different name.
ii) Redundancy:
 If an attribute can be derived from any other attribute or set of
attributes, it is said to redundant.
 It is caused due to inconsistencies in attributes or dimension naming.
(E.g.) Attribute Age can be derived from attribute DOB, therefore both
the attributes are considered to be redundant.
3. Data Transformation:
 The important pre-processing task is data transformation. The
common techniques are –
i) Smoothing – it removes noise from data.
ii) Aggregation – It summarises data and constructs data cubes.
iii) Generalisation – It is also known as concept hierarchy
climbing.
iv) Attribute / Feature Construction – It composes new attributes
from the given ones.
v) Normalisation – It scales the data within a small, specified
range.
4. Data Reduction:
 Data reduction is a pre-processing techniques which helps in
obtaining reduced representation of data set (i.e.) set having
smaller volume of data from the available data set.
 When it is performed on reduced data set, it produces the same
analytical results as that obtained from the original data set,
saving the time needed for computation.
 The main advantage of this techniques is even after reduction,
integrity of original data is still maintained.
 It use data cube aggregation, attribute subset selection,
dimensionality reduction, etc.
Strategies for data reduction includes -
i) Data Cube Aggregation – it is applied to the data in the
construction of a data cube.
ii) Attribute Subset Selection – irrelevant, weakly relevant or
redundant attributes or dimension may be detected and removed.
iii) Dimensionality reduction – here encoding mechanism are used
to reduce data set size.
iv) Numerosity Reduction – here data are replaced or estimated by
alternative, smaller data representations such as parametric models
or non-parametric models such as clustering, sampling and use of
histograms.
Dimensionality Reduction:
 Dimensionality reduction is the process of reducing the number
of random variables or attributes under consideration.
This method includes –
1. Wavelet Transforms:
 This method works by using its variant called Discrete Wavelet
Transform (DWT).
 A hierarchical pyramid algorithm is used for applying a DWT
which halves the data at each iteration and thus results in a fast
computational speed.
 This method can be applied to multidimensional data, such as
data cube.
2. Principal Components Analysis (PCA):
 It is a technique used to reduce multidimensional data sets to lower
dimensions for analysis.
 It is mostly used as a tool in exploratory data analysis and for making
predictive models.
 It involves the calculation of the eigen values or singular value
decomposition of data set, usually after mean centering the data for
each attribute.
 It is used abundantly in all forms of analysis, because it is simple,
non-parametric method of extracting relevant information from
confusing data sets.
 It is used for dimensionality reduction in a data set by retaining those
characteristics of data set that contribute to variances.
5. Discretisation and Concept Hierarchy Generation:
 Data discretisation techniques can be used to reduce the number of
values for a given continuous attribute by dividing the range of the
attribute into intervals.
 It is a part of data reduction but with particular importance,
especially numerical data.
a) Discretisation Techniques:
 If process starts by first finding one or a few points to split the
entire attribute range and then repeats this recursively on the
results intervals, is called top – down discretisation or splitting.
 It performs recursively on a attribute to provide a hierarchical or
multiresolution partitioning of the attribute values, known as
concept hierarchy.
 A concept hierarchy for a given numerical attributes defines a
discretisation of the attribute.
 It can be used to reduce the data by collecting and replacing low-
level concepts by higher-level concepts.
 This contributes to a consistent representation of data mining results
among multiple mining tasks, which is common requirement.
 Discretisation techniques and concept hierarchies are typically
applied before data mining as a pre-processing step, rather than
during mining.
 Several discretisation methods can be used to automatically generate
or dynamically refine concept hierarchies for numerical attributes
within the database schema.
b) Concept Hierarchy Generation for Numerical data:
 It is constructed automatically based on data discretisation.
 It examines the following methods –
i) Binning – it is top – down splitting technique based on a specified
number of bins.
 This method are also used as discretisation methods for numerosity
reduction and concept hierarchy generation.
ii) Histogram Analysis – it is popular unsupervised discretisation
technique because it does not use class information.
 It divides data into buckets and store average for each bucket.
iii) Entropy-based discretisation – it is supervised, top-down splitting
technique.
 It explores class distribution information in its calculation and
determination of split points.
iv) Interval Merging by chi square analysis – It employs a bottom-up
approach by finding the best neighbouring intervals and then merging
these to form larger intervals.
 This method is supervised in that it uses class information.
v) Cluster Analysis – clustering generate a concept hierarchy by
following either a top-down splitting strategy or a bottom-up merging
strategy, where cluster forms a node of concept hierarchy.
 Clusters are formed by repeatedly grouping neighbouring clusters in
order to form higher-level concepts.
vi) Discretisation by Intuitive Partitioning – many users like to see
numerical ranges partitioned into relatively uniform, easy-to-read
intervals that appear.
 A simply 3-4-5rule can be used to segment numeric data into
relatively inform, ‘natural” intervals..
Introduction:
 Association Rules are an important class of methods of finding
regularities or patterns in data and extensively studied by
databases and data mining community.
 Association rule is an important component of data mining and
used in many application domains.
 It is also important role in discovering knowledge from
agricultural databases, survey data from agricultural research,
data about soil and cultivation, data containing information
linking geographical conditions and crop production.
Association rule mining
Introduction:
 Few examples are, it is used in business field where discovering of
purchase patterns or association between products is very useful for
decision-making and effective marketing. Some recent applications
are – finding patterns in biological databases, extraction of
knowledge from software engineering metrics, web personalisation,
text mining, etc.
Definition:
Association rule mining means, ‘finding frequent patterns,
associations, correlations or casual structures among set of items or
objects in transactional databases, relational databases and other
information repositories.’
Association rule mining
Association rules:
 The basic objective of finding association rules is to find all
co-occurrence relationship called associations.
 It was first introduced in 1993 by Agarwal, it has attracted
a great deal of attention.
 The classic application of association rule mining is market
basket data analysis, which aims to discover how items
purchased by customers in a supermarket are associated.
 Association rules are of form X – Y, where X and Y are
collection of items and intersection of X and Y is null.
(E.g.) 95% of the customers who bought bread (X) also
bought milk (Y) or Jam (Y).
Association rules:
 A rule may contain more than one item in antecedent and
consequent of rule. (E.g.) the rule {onion, potatoes} =
{burger} found in sales data of supermarket would indicate
that if a customer buys onions and potatoes together, it is
likely to also buy burger meat. Such information can be
used as basis for decisions about marketing activities like
promotional pricing or product placements.
 Every rule must satisfy two users specified constrain:
i) one is measure of statistical significance called support and
ii) other is measure of goodness called confidence.
Association rules:
(E.g.) Consider the following database with 4 items and 5 transactions:
The set of items is I = {milk, bread, butter, beer} and a small database
containing the items (1 codes presence and 0 absence of an item in a
transaction) is shown in table to the right. An example rule for the
supermarket could be {butter, bread} = {milk} meaning that if butter and
bread are brought, customers also buy milk.
Transaction ID Milk Bread Butter Beer
1 1 1 0 0
2 0 0 1 0
3 0 0 0 1
4 1 1 1 0
5 0 1 0 0
Mining Association Rules:
Definitions and Terminology:
•Transaction is a set of items (Itemset).
•Confidence: It is the measure of
uncertainty or trust worthiness associated
with each discovered pattern.
•Support: It is the measure of how often the collection of items in an
association occur together as percentage of all transactions
•Frequent itemset: If an itemset satisfies minimum support, then it is a
frequent itemset.
•Strong Association rules: Rules that satisfy both a minimum support
threshold and a minimum confidence threshold
•In Association rule mining, we first find all frequent itemsets and then
generate strong association rules from the frequent itemsets
Introduction:
 Market Basket Analysis (Association Analysis) is a
mathematical modeling technique based upon the theory that if
you buy a certain group of items, you are likely to buy another
group of items. The process of discovering frequent item sets in
large transactional database is called market basket analysis.
Frequent item set mining leads to the discovery of associations
and correlations among items.
 It is used to analyze the customer purchasing behavior and
helps in increasing the sales and maintain inventory by focusing
on the point of sale transaction data.
Market basketanalysis
Step by step computations of market basket analysis are
 Step1: generate all possible association rules.
 Step2: compute the support and confidence of all possible rules.
 Step3: apply 2 threshold-minimum support & minimum confidence to
obtain the result.
Apriori Algorithm:
 In computer science and data mining, Apriori is a classic algorithm for
learning association rules.
 Apriori is designed to operate on databases containing transactions (for
example, collections of items bought by customers, or details of a
website frequentation).
 The algorithm attempts to find subsets which are common to at least a
minimum number C (the cutoff, or confidence threshold) of the itemsets.
Market basketanalysis
Example:
Transaction ID Items Bought
1 Shoes, Shirt, Jacket
2 Shoes,Jacket
3 Shoes, Jeans
4 Shirt, Sweatshirt
Frequent Itemset Support
{Shoes} 75%
{Shirt} 50%
{Jacket} 50%
{Shoes, Jacket} 50%
Pseudo Code for Apriori Algorithm:
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1= {frequent items};
For (k=1; Lk!=0; k++) do begin
Ck+1= Candidates generated from Lk;
For each transaction t in the database do
Increment the count of all candidates in Ck+1 that are
contained in t
Lk+1=candidates in Ck+1 with min_support
End
Return UkLk;
Limitation of Apriori Algorithm:
i) When the size of the database is very large, the Apriori
algorithm will fail. Because large database will not fit with
memory (RAM). So each pass requires large number of disk
reads.
ii) Requires up to m database scans.
iii) Any frequent item set with respect to a partition may or
may not be frequent with respect to the entire Database D.
iv) It is costly to handle a huge number of candidate sets.
Frequent Pattern Tree (FP Tree of FP growth Algorithm):
* An efficient and scalable method to complete set of frequent
patterns.
* It allows frequent item set discovery without candidate item
set generation.
Two step approach:
* Build a compact data structure called
the FP-Tree.
* Extracts frequent item set directly
from the FP-Tree.
Steps of Frequent Pattern Tree :
Step 1: Calculate Minimum Support.
Step 2: Find frequent of occurrence.
Step 3: Prioritize the items.
Step 4: Order the items according to priority:
Draw FP Growth Tree:
Mining Various kinds of Association Rules:
1. Generalised Association rules – these rules make use of a
concept of hierarchy and same like regular association rule.
2. Multi-Level Association rules – strong association rules
discovered when data are mined at multiple levels of
abstraction.
i) By using uniform minimum support for all levels.
ii) Using reduced mini. Support at lower levels.
iii) Using item or group-based mini. Support.
3. Multidimensional Association rules – more than one
dimensions.
It has inter-dimensional & Hybrid-dimensional rules.
Advantages of Association Rules:
1. Results are clearly understood.
2. Strong for undirected data mining.
3. Works on variable length data.
4. Computationally simple.
Disadvantages of Association Rules:
1. Exponential growth as problem size increases.
2. Limited support for data attributes.
3. Determining the right items.
4. Association rule analysis has trouble with rare items.
Introduction:
 Classification is a data mining (machine learning) technique used to
predict group membership for data instances. (E.g.) anyone can use
classification to predict whether the weather on a particular day will
be sunny, rainy or cloud.
 This data analysis helps us to provide a better understanding of large
data. Classification predicts categorical (discrete, unordered) and
prediction models predict continuous valued functions.
(E.g.) A bank loan officer wants to analyse the data in order to know
which customer (loan applicant) are risky or which are safe.
 A marketing manager at a company needs to analyse to guess a
customer with a given profile will buy a new computer.
CLASSIFICATION TECHNIQUES
Data classification is a two-step process –
Step 1: Classifier is built describing a predetermined set of data classes
or concepts.
 Each tuple / sample is assumed to belong to a predefined class as
determined by another database attribute called the class label
attribute.
 Class label attribute is categorical that each value serves as category
or class.
 Class label of each training tuple is provided, this step is also known
as supervised learning (i.e.) the learning of the classifier is
“supervised” in that it is told to which class each training tuple
belongs).
Working of CLASSIFICATION TECHNIQUES
Step 2:
 In this step, the predictive accuracy of classifier is estimated and use
the training set of measure the accuracy of the classifier, this estimate
likely to be optimistic, because classifier tends to over fit the data.
 A test set is used, made up of tuples and their associated class labels
and randomly selected from the general data set.
 The associated class label of each test tuple is compared with the
learned classifier’s class prediction for that tuple.
 (E.g.) Classification rules from the analysis of data from previous loan
applications can be used to approve or reject new or future loan
applicants.
Working of CLASSIFICATION TECHNIQUES
 Decision tree structures are a common way to organize classification
schemas. It visualizes what steps are taken to arrive at a classification.
 Every decision tree begins with what is termed as root node, considered
to be the parent of every other node.
 Decision tree is a structure that includes root node, branch and leaf node.
Each internal node denotes a test on attribute, each branch denotes the
outcome of test and each leaf node holds the class label. The top most
node in the tree is the root node.
 Decision tree can represent diverse types of data especially numerical
data.
 Decision tree builds classification or regression models in the form of a
tree structure and breaks down dataset into smaller and smaller subsets.
CLASSIFICATION TECHNIQUES– Decision Tree
Decision trees are based on strategy “divide and conquer” and two types
of divisions or partitions are –
a) Nominal Partitions – a nominal attribute may lead to a split with as
many branches as values there are for the attribute.
b) Numerical Partitions – they allow partitions like X > a and X < a.
Partitions relating two different attributes are not permitted.
Method to build Decision Trees:
The core algorithm for building decision trees called ID3 by J.R.Quinlan
which employs a top-down, greedy search through the space of possible
branches with no backtracking. It uses Entropy and Information Gain to
construct a decision tree.
CLASSIFICATION TECHNIQUES– Decision Tree
a) Entropy: A decision tree is built top-down from a root node and
involves partitioning the data into subsets that contain instances with
similar values (homogeneous).
i) Entropy using frequency table of one attribute: Entropy calculation
using frequency table of one attribute is shown below:
ii) Entropy using frequency table of two
attributes: Entropy calculation using frequency table
of two attribute is shown below:
CLASSIFICATION TECHNIQUES– Decision Tree
Play Golf
Yes No
9 5
Play Golf
Yes No Total
Outlook
Sunny 3 2 5
Overcast 4 0 4
Rainy 2 3 5
b) Information Gain: The information gain is based on the decrease in
entropy after a dataset is split on an attribute. Constructing a decision tree is
all about finding attribute that returns the highest information gain (i.e.) the
homogeneous branches)
Step 1: Calculate entropy of the target.
Step 2: The dataset is split on the different attributes. The entropy of each
branch is calculate and added proportionally to get total entropy of split and
result entropy is subtracted from entropy before the split, that is Information
Gain.
Step 3: Choose attribute with largest information gain as the decision node.
Step 4A: A branch with entropy of 0 is a leaf node.
Step 4B: A branch with entropy more than 0 needs further splitting.
CLASSIFICATION TECHNIQUES– Decision Tree
Advantages:
 Simple to understand and interpret.
 Requires little data preparation.
 Able to handle both numerical and categorical data.
 Use a white box model. (i.e.) given situation is observable based on the
conditions as per Boolean logic.
 Possible to validate a model using statistical tests.
 Robust, perform well with large data in a short time.
Disadvantages:
 The reliability of the information in the decision tree depends on feeding
the precise internal and external information at the onset.
 Major decision tree disadvantages are its complexity, as it takes time-
consuming.
CLASSIFICATION TECHNIQUES– Decision Tree
Bayesian Classification:
 Bayesian classification represents a supervised learning
method as well as a statistical method for classification.
 This classification is named after Thomas Bayes (1702 –
1761) who proposed Bayes Theorem.
 It able to predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
 It provides practical algorithms and prior knowledge and
observed data can be combined.
CLASSIFICATION TECHNIQUES– Bayesian Classification
Using the Bayes Theorem for Classification:
 Before describing how the Bayes theorem can be used for
classification let formalise the classification problem from a statistical
perspective. During the training phase, one needs to learn the posterior
probabilities for every combination of X and Y based on information
gathered from the training data.
 To classify the record, we need to compute the posterior probabilities
based on information available in the training data.
 Estimating the posterior probabilities accurately for every possible
combination of class label and attribute value is a difficult problem
because it requires a very large training set, even for moderate number
of attributes.
CLASSIFICATION TECHNIQUES– Bayesian Classification
Naïve Bayes Classifier:
Naïve Bayesian Classifier is based on Baye’s theorem with
independence assumptions between predictors.
A Naïve Bayesian model is easy to build, with no complicated iterative
parameter estimation which makes it particularly useful for very large
datasets.
A Naïve Bayes classifier estimates the class-conditional probability by
assuming that the attributes are conditionally independent, given class
label y.
P (X l Y = y = 𝝥 P (Xi l Y = y)
Where each attribute set X = {X1, X2, ……. Xd} consists of d attributes.
CLASSIFICATION TECHNIQUES– Bayesian Classification
Working of Naïve Bayes Classifier:
With the conditional independence assumption, instead of computing
the class-conditional probability for every combination of X, one only
have to estimate the conditional probability of each Xi, given Y. The
latter approach is more practical because it does not require a very large
training set to obtain a good estimate of the probability.
To classify a test record, the Naïve Bayes classifier computes the
posterior probability for each class Y:
P (Y l X) = P (Y) 𝝥 P (Xi l Y) / P (X)
Since P (X) is fixed for every Y, it is sufficient to choose the class that
maximises the numerator term P (Y) 𝝥 P (Xi l Y).
CLASSIFICATION TECHNIQUES– Bayesian Classification
(E.g.):
Consider the data set shown in below table. Once can compute the class-
conditional probability for each categorical attribute, along with the
sample mean and variance for continuous attribute using the
methodology.
CLASSIFICATION TECHNIQUES– Bayesian Classification
Tid
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No
Divorce
d
95K Yes
6 No Married 60K No
7 Yes
Divorce
d
220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
P (Home Owner=Yes/No) =3/7
P (Home Owner=No/No) =4/7
P (Home Owner=Yes/No) =0
P (Home Owner=No/Yes) =1
P (Marital Status=Single/No) =2/7
P (Marital Status=Divorced/No) =1/7
P (Marital Status=Married/No) =4/7
P (Marital Status=Single/Yes) =2/3
P (Marital Status=Divorced/Yes) =1/3
P (Marital Status=Married/Yes) =0
Characteristics:
 They are robust to isolated noise points because such points are
averaged out when estimating conditional probabilities from data.
 It also handles missing values by ignoring the example during model
building and classification.
 They are robust to irrelevant attributes. The class conditional
probability has no impact on overall computation of the posterior
probability.
 Correlated attributes can degrade the performance of naïve Bayes
classifiers because the conditional independence assumption no longer
holds for such attributes.
CLASSIFICATION TECHNIQUES– Bayesian Classification
Uses of Naïve Bayes Classification:
1. Naïve Bayes Text Classification: It is used as a probabilistic learning
method and it is most successful known algorithms for learning to classify
text documents.
2. Spam filtering: It is the best known use of Naïve Bayesian text
classification. It makes use of naïve Bayes classifier to identify spam e-mail.
3. Hybrid Recommender System using Naïve Bayes Classifier and
Collaborative Filtering: It apply machine learning a data mining techniques
for filtering unseen information and can predict whether a user would like a
given resource.
4. Online Application: This online application has been set up as a simple
example of supervised machine learning and affective computing. It
employs single words and word pairs as features.
CLASSIFICATION TECHNIQUES– Bayesian Classification
K-nearest neighbour:
 k-nearest neighbour classifier is one of the examples of lazy learner
which is used in the area of pattern recognition.
 It assumes the training set constitutes the entire data of the set as well
as the desired classification for each item. Thus, training data become
the model for the future data.
 k-nearest neighbour classifier learns by comparing a given test tuple
with the training tuples that are similar to it.
 The similarity or closeness between the two tuples is determined in
terms of a distance metric such as Euclidean distance.
 It basically stores all of the training tuples in an n-dimensional pattern
space, where each tuple represents a point in this space.
CLASSIFICATION TECHNIQUES – k-nearest neighbours
K-nearest neighbour:
 It searches the pattern space for k training tuples that are
closest or similar to unknown tuple.
 If the attributes are of categorical type, the distance is
computed by comparing the corresponding value of the
attribute in tuple X1 and tuple X2.
 k-nearest neighbours are considered as good predictors,
robust to outliers and with the capability of handling the
missing values.
CLASSIFICATION TECHNIQUES – k-nearest neighbours
Algorithm of k-nearest neighbour classification:
A high level summary of the nearest neighbour classification methods is
given as –
Step 1: Let k be the number of nearest neighbours and D be the set of
training examples.
Step 2: For each test example Z = (x’. y’) do.
Step 3: Compute d (x’, x), the distance between z and every example, (x,
y) € D.
Step 4: Select Dz ⸦ D, the set of k closest training examples to z.
Step 5: y’ = arg. Max. ∑(x,y) € D I (v = yi)
Step 6: end for
CLASSIFICATION TECHNIQUES – k-nearest neighbours
Characteristics of k-nearest neighbour:
 Nearest neighbour classification is a part of a more general techniques
known as instance-based learning.
 Lazy learners such as nearest neighbour classifiers don’t require model
building.
 It makes their predictions based on local information, whereas decision
tree and rule-based classifiers attempt to find a global model that fits the
entire input space.
 It can produce arbitrarily shaped decision boundaries, which provide a
more flexible model representation compared to decision tree and rule-
based classifiers that are often constrained to rectilinear decision
boundaries.
CLASSIFICATION TECHNIQUES – k-nearest neighbours
Association Classification:
 Associative Classification (AC) is a data mining approach that
combines association rule and classification to build classification
models (classifiers).
 This training phase is about searching for hidden knowledge primarily
using association rule algorithms and then a classification model is
constructed after sorting the knowledge in regards to certain criteria.
 AC is considered a special case of the association rule where the
target attribute is considered in the rule’s right hand side.
 Associative classification depends mainly on two important thresholds
called minimum support and minimum confidence.
CLASSIFICATION TECHNIQUES– Classification algorithm
Associative Classification Algorithm:
An AC algorithm operates in three main phases –
Phase 1: AC searches for hidden correlations between the attribute
values and class attribute values in the training data set. The rules
Class Association Rule (CARs) are generated from them in ‘if-then’
format.
Phase 2: In this phase, ranking and pruning procedures start process,
at this stage, CARs ranked according to a certain number of
parameters like confidence and support values to ensure that rules
with high confidence
Phase 3: Lastly, the classification model is utilized to predict the class
values on new unseen data set (test data).
CLASSIFICATION TECHNIQUES– Classification algorithm
Methods of Associative Classification Algorithm:
1. Classification based on Associations (CBA).
2. Classification based on Multiple Association Rules
(CMAR).
3. Classification based on Predictive Association Rules
(CPAR).
CLASSIFICATION TECHNIQUES– Classification algorithm
Data Mining

Data Mining

  • 1.
    Mr.T.Somasundaram Assistant Professor Department OfManagement Kristu Jayanti College (Autonomous), Bengaluru UNIT 4 INTRODUCTION TO DATA MINING
  • 2.
    Data Mining –Definition, Challenges, tasks, Data pre- processing, Data Cleaning, missing data, dimensionality reduction, data transformation, measures of similarity and dissimilarity, Introduction to Association rules, APRIORI algorithm, partition algorithm, FP growth algorithm, Introduction to Classification techniques, Decision tree, Naïve-Bayes classifier, k-nearest neighbour, classification algorithm. Unit 4 introductionto datamining
  • 3.
    Data mining -introduction Meaning:  Data mining is the art and science of using more powerful algorithms, than traditional query tools such as SQL to extract more useful information.  Data mining is also termed as “Knowledge Discovery in Databases”, is the extraction of hidden predictive information from large databases.  It is a powerful new technology with great potential to help companies. (E.g.) to focus on the most important information in their data warehouses.
  • 4.
    Data mining -introduction Meaning:  Data mining is the core of the KDD process, involves inferring of algorithm that explore the data, develop model and discover unknown patterns.  The model is used for understanding phenomena from the data, analysis and prediction.  Data mining is concerned with discovering knowledge.  It is about uncovering relationship or patterns hidden in data that can be used to predict outcomes.
  • 5.
    Definition: Data mining isthe search of relationships and global patterns that exist in large databases but are ‘hidden’ among the vast amount of data, such as a relationship between patient data and their medical diagnosis. - Marcel Holshemier & Arno Siebes Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. - William J Frawley
  • 6.
    Characteristics:  Relevant datastored in large DB like data warehouse.  It is like end user & tools to get quick response.  Data mining has client / server architecture.  It use parallel processing of data mining due to large amount of data.  It easily combined with spread sheets & other end user software development tool for analyze process.  It get 5 types of information like association, sequence, classification, cluster, forecasting.  It use to find unexpected, valuable results.
  • 7.
    NEED OF Datamining Data mining is needed in organization for following reasons - 1. Operational:  It ensure the everyday operation of a business run smoothly.  This is done by using information to make changes if mistakes are found.  The information gained is to help the business maintain a level of productivity and proficiency.
  • 8.
    2. Decisional:  Datamining help manager or analyst to make decisions about the future of company based on historical and real current data.  It is used for planning of long term goals as well as short term changes. 3. Informational:  It is available to those who need it, it is easily accessible. 4. Specific Uses:  It can be used as model to ‘anticipate future customer behaviour, based on historical data of interaction with a particular company.
  • 9.
    data cleaning, integration,and selection Database or Data Warehouse Server Data Mining Engine Pattern Evaluation Graphical User Interface Knowledge- Base Database Data Warehouse World-Wide Web Other Info Repositories ARCHITECTURE OF Data mining
  • 10.
    The architecture ofdata mining system may have the following major components - 1. Database, Data Warehouse or Other information repository:  This is one or set of databases, DW or others, in which data cleaning & integration techniques may be performed on the data. 2. Database or Data Warehouse Server:  It is responsible for fetching the relevant data, based on the user’s data mining request.
  • 11.
    3. Knowledge Base: This is domain knowledge that is used to guide the search, evaluate the resulting patterns.  It include hierarchies, used to organize attributes into different level of abstraction. 4. Data Mining Engine:  This is essential to data mining system and ideally consists of set of functional modules for tasks such as characterization, association, classification, cluster analysis and deviation analysis.
  • 12.
    5. Pattern EvaluationModule:  This component typically interact with the data mining modules to focus the search towards interesting patterns.  It may use interestingness thresholds to filter out discovered patterns. 6. Graphical User Interface:  This module communicates between users and data mining system allowing user to interact with system by specifying a data mining query.  It allows user to browse database and data warehouse schemas or structures and visualize the patterns.
  • 13.
    Advantages:  Automated Predictionof Trends and Behaviours.  Automated Discovery of Previously Unknown patterns.  Databases can be larger in both depth and breadth. Disadvantages:  Privacy issues.  Security issues.  Misuse of Information / Inaccurate Information. advantages & disadvantages of data mining
  • 14.
    1. Retail /Marketing:  Identify buying patterns from customers.  Finding associations among customer demographic details.  Market basket analysis. 2. Banking:  Detect patterns of fraudulent credit card use.  Identify ‘loyal’ customers.  Predict customers likely to change their credit card affiliation.  Determine credit card spending by customer groups.  Find hidden correlations between different financial indicators.  Identify stock trading rules from historical market data. APPLICATIONSof data mining
  • 15.
    3. Insurance andHealth Care:  Claims analysis (i.e.) which medical procedures are claimed together.  Predict which customers will buy new policies.  Identify behaviour patterns of risky customers.  Identify fraudulent behaviour. 4. Transportation:  Determine the distribution schedules among outlets.  Analyse loading patterns. 5. Medicine:  Characterize patient behaviour to predict office visits.  Identify successful medical therapies for different illnesses. APPLICATIONSof data mining
  • 16.
    The tasks indata mining are classified into – 1. Summarisation:  This is the abstraction or generalisation of data.  A set of task-relevant data is summarised and abstracted and gives overview of data, usually with aggregate information. (E.g.) summarisation can go to different abstraction levels and viewed from different angles with different kinds of pattern. (i.e.) calls can be summarised into in state calls, state-to-state calls, calls to Asia, etc. which can be further summarised into domestic calls and international calls. data miningTASKS
  • 17.
    2. Classification:  Thisderives a function or model, which determines the class of an object, based on its attributes.  A set of objects is given as the training set.  It is constructed by analysing the relationship between attributes and the classes of the objects in the training set.  This helps us to develop a better understanding of the classes of the objects in the database. (E.g.) Classification model can diagnose a new patient’s disease based on data like age, ender, weight, BP, etc. and concludes a patient’s disease from his / her diagnostic data. data miningTASKS
  • 18.
    3. Association:  Associationrules can be useful for marketing, commodity management, advertising, etc.  (E.g.) Retail store discover that people tend to buy soft drinks and potato chips together. 4. Clustering:  This identifies classes – also called clusters or groups – for a set of object who classes are unknown.  The objects are so clustered that the interclass similarities are maximised and the interclass similarities are minimised. (E.g.) Bank may clusters its customers into several groups based on the similarities of age, income and residence and used to describe the customers suitable products and service. data miningTASKS
  • 19.
    5. Trend Analysis: Time series data are records accumulated over time. (E.g.) Company’s sales, a customer’s credit card transactions and stock prices are all time series data. (E.g.) one can estimate this year’s profit for a company based on its profits last year and estimated annual increasing rate.  By comparing two or more objects historical changing curves or tracks, similar and dissimilar trends can be discovered. (E.g.) Company’s sales and profit figures can be analysed to find the disagreeing trends. data miningTASKS
  • 20.
    Data goes througha series of steps during pre-processing – data PREPROCESSING Data Cleaning Data Integration Data Transformation Data Reduction Discretisation and Concept Hierarchy Generation
  • 21.
    1. Data Cleaning: It is used to determine inaccurate, incomplete or unreasonable data and improving the quality through correction of detected errors and omissions.  It include format checks, completeness checks, reasonable checks, limit checks, other errors, etc.  This process result in flagging, documenting and subsequent cheking and correction of suspect records.  Validation checks may also involve checking for compliance against applicable standards, rules and conventions.  Data cleaning identifying outliers and correct inconsistencies in data. data PREPROCESSING
  • 22.
    Missing Values (Data): Missing data items present a problem that can be dealt with in several ways.  In most cases, missing attribute value indicate lost information. (E.g.) Missing value for salary may be taken as an unentered data item, but it could also indicate an individual who is unemployed. The following are the possible options for dealing with missing data before data is presented to a data mining algorithm – a) Discard records with missing values. (appropriate for small percent) b) Replace missing values with class mean for real-valued data. (appropriate for numerical attributes) c) Replace missing values with values found within other highly similar instances. (appropriate for both numerical or categorical)
  • 23.
    2. Data Integration: Data integration is important for companies to develop an enterprise- wide strategy for integration.  It provide a single, unified view of the business data scattered across various systems in organization.  The technique for data integration is straight forward approach for merging data from one database to another (E.g.) merging customer data from CRM database into manufacturing system database.  It is pre-processing method involves merging of data from different sources to form a coherent data store like data warehouse.
  • 24.
    Some issues indata integration are - i) Schema Integration and Object Matching:  This issue is also known as entity identification problem as one doesn’t know how equivalent real-world entities from multiple data sources can be matched up. (E.g.) In two different databases maintained in organization indicates Emp. ID and Emp. No, where same attribute used in different name. ii) Redundancy:  If an attribute can be derived from any other attribute or set of attributes, it is said to redundant.  It is caused due to inconsistencies in attributes or dimension naming. (E.g.) Attribute Age can be derived from attribute DOB, therefore both the attributes are considered to be redundant.
  • 25.
    3. Data Transformation: The important pre-processing task is data transformation. The common techniques are – i) Smoothing – it removes noise from data. ii) Aggregation – It summarises data and constructs data cubes. iii) Generalisation – It is also known as concept hierarchy climbing. iv) Attribute / Feature Construction – It composes new attributes from the given ones. v) Normalisation – It scales the data within a small, specified range.
  • 26.
    4. Data Reduction: Data reduction is a pre-processing techniques which helps in obtaining reduced representation of data set (i.e.) set having smaller volume of data from the available data set.  When it is performed on reduced data set, it produces the same analytical results as that obtained from the original data set, saving the time needed for computation.  The main advantage of this techniques is even after reduction, integrity of original data is still maintained.  It use data cube aggregation, attribute subset selection, dimensionality reduction, etc.
  • 27.
    Strategies for datareduction includes - i) Data Cube Aggregation – it is applied to the data in the construction of a data cube. ii) Attribute Subset Selection – irrelevant, weakly relevant or redundant attributes or dimension may be detected and removed. iii) Dimensionality reduction – here encoding mechanism are used to reduce data set size. iv) Numerosity Reduction – here data are replaced or estimated by alternative, smaller data representations such as parametric models or non-parametric models such as clustering, sampling and use of histograms.
  • 28.
    Dimensionality Reduction:  Dimensionalityreduction is the process of reducing the number of random variables or attributes under consideration. This method includes – 1. Wavelet Transforms:  This method works by using its variant called Discrete Wavelet Transform (DWT).  A hierarchical pyramid algorithm is used for applying a DWT which halves the data at each iteration and thus results in a fast computational speed.  This method can be applied to multidimensional data, such as data cube.
  • 29.
    2. Principal ComponentsAnalysis (PCA):  It is a technique used to reduce multidimensional data sets to lower dimensions for analysis.  It is mostly used as a tool in exploratory data analysis and for making predictive models.  It involves the calculation of the eigen values or singular value decomposition of data set, usually after mean centering the data for each attribute.  It is used abundantly in all forms of analysis, because it is simple, non-parametric method of extracting relevant information from confusing data sets.  It is used for dimensionality reduction in a data set by retaining those characteristics of data set that contribute to variances.
  • 30.
    5. Discretisation andConcept Hierarchy Generation:  Data discretisation techniques can be used to reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals.  It is a part of data reduction but with particular importance, especially numerical data. a) Discretisation Techniques:  If process starts by first finding one or a few points to split the entire attribute range and then repeats this recursively on the results intervals, is called top – down discretisation or splitting.  It performs recursively on a attribute to provide a hierarchical or multiresolution partitioning of the attribute values, known as concept hierarchy.
  • 31.
     A concepthierarchy for a given numerical attributes defines a discretisation of the attribute.  It can be used to reduce the data by collecting and replacing low- level concepts by higher-level concepts.  This contributes to a consistent representation of data mining results among multiple mining tasks, which is common requirement.  Discretisation techniques and concept hierarchies are typically applied before data mining as a pre-processing step, rather than during mining.  Several discretisation methods can be used to automatically generate or dynamically refine concept hierarchies for numerical attributes within the database schema.
  • 32.
    b) Concept HierarchyGeneration for Numerical data:  It is constructed automatically based on data discretisation.  It examines the following methods – i) Binning – it is top – down splitting technique based on a specified number of bins.  This method are also used as discretisation methods for numerosity reduction and concept hierarchy generation. ii) Histogram Analysis – it is popular unsupervised discretisation technique because it does not use class information.  It divides data into buckets and store average for each bucket. iii) Entropy-based discretisation – it is supervised, top-down splitting technique.  It explores class distribution information in its calculation and determination of split points.
  • 33.
    iv) Interval Mergingby chi square analysis – It employs a bottom-up approach by finding the best neighbouring intervals and then merging these to form larger intervals.  This method is supervised in that it uses class information. v) Cluster Analysis – clustering generate a concept hierarchy by following either a top-down splitting strategy or a bottom-up merging strategy, where cluster forms a node of concept hierarchy.  Clusters are formed by repeatedly grouping neighbouring clusters in order to form higher-level concepts. vi) Discretisation by Intuitive Partitioning – many users like to see numerical ranges partitioned into relatively uniform, easy-to-read intervals that appear.  A simply 3-4-5rule can be used to segment numeric data into relatively inform, ‘natural” intervals..
  • 34.
    Introduction:  Association Rulesare an important class of methods of finding regularities or patterns in data and extensively studied by databases and data mining community.  Association rule is an important component of data mining and used in many application domains.  It is also important role in discovering knowledge from agricultural databases, survey data from agricultural research, data about soil and cultivation, data containing information linking geographical conditions and crop production. Association rule mining
  • 35.
    Introduction:  Few examplesare, it is used in business field where discovering of purchase patterns or association between products is very useful for decision-making and effective marketing. Some recent applications are – finding patterns in biological databases, extraction of knowledge from software engineering metrics, web personalisation, text mining, etc. Definition: Association rule mining means, ‘finding frequent patterns, associations, correlations or casual structures among set of items or objects in transactional databases, relational databases and other information repositories.’ Association rule mining
  • 36.
    Association rules:  Thebasic objective of finding association rules is to find all co-occurrence relationship called associations.  It was first introduced in 1993 by Agarwal, it has attracted a great deal of attention.  The classic application of association rule mining is market basket data analysis, which aims to discover how items purchased by customers in a supermarket are associated.  Association rules are of form X – Y, where X and Y are collection of items and intersection of X and Y is null. (E.g.) 95% of the customers who bought bread (X) also bought milk (Y) or Jam (Y).
  • 37.
    Association rules:  Arule may contain more than one item in antecedent and consequent of rule. (E.g.) the rule {onion, potatoes} = {burger} found in sales data of supermarket would indicate that if a customer buys onions and potatoes together, it is likely to also buy burger meat. Such information can be used as basis for decisions about marketing activities like promotional pricing or product placements.  Every rule must satisfy two users specified constrain: i) one is measure of statistical significance called support and ii) other is measure of goodness called confidence.
  • 38.
    Association rules: (E.g.) Considerthe following database with 4 items and 5 transactions: The set of items is I = {milk, bread, butter, beer} and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in table to the right. An example rule for the supermarket could be {butter, bread} = {milk} meaning that if butter and bread are brought, customers also buy milk. Transaction ID Milk Bread Butter Beer 1 1 1 0 0 2 0 0 1 0 3 0 0 0 1 4 1 1 1 0 5 0 1 0 0
  • 40.
    Mining Association Rules: Definitionsand Terminology: •Transaction is a set of items (Itemset). •Confidence: It is the measure of uncertainty or trust worthiness associated with each discovered pattern. •Support: It is the measure of how often the collection of items in an association occur together as percentage of all transactions •Frequent itemset: If an itemset satisfies minimum support, then it is a frequent itemset. •Strong Association rules: Rules that satisfy both a minimum support threshold and a minimum confidence threshold •In Association rule mining, we first find all frequent itemsets and then generate strong association rules from the frequent itemsets
  • 41.
    Introduction:  Market BasketAnalysis (Association Analysis) is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items. The process of discovering frequent item sets in large transactional database is called market basket analysis. Frequent item set mining leads to the discovery of associations and correlations among items.  It is used to analyze the customer purchasing behavior and helps in increasing the sales and maintain inventory by focusing on the point of sale transaction data. Market basketanalysis
  • 42.
    Step by stepcomputations of market basket analysis are  Step1: generate all possible association rules.  Step2: compute the support and confidence of all possible rules.  Step3: apply 2 threshold-minimum support & minimum confidence to obtain the result. Apriori Algorithm:  In computer science and data mining, Apriori is a classic algorithm for learning association rules.  Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation).  The algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets. Market basketanalysis
  • 43.
    Example: Transaction ID ItemsBought 1 Shoes, Shirt, Jacket 2 Shoes,Jacket 3 Shoes, Jeans 4 Shirt, Sweatshirt Frequent Itemset Support {Shoes} 75% {Shirt} 50% {Jacket} 50% {Shoes, Jacket} 50%
  • 44.
    Pseudo Code forApriori Algorithm: Ck: Candidate itemset of size k Lk: Frequent itemset of size k L1= {frequent items}; For (k=1; Lk!=0; k++) do begin Ck+1= Candidates generated from Lk; For each transaction t in the database do Increment the count of all candidates in Ck+1 that are contained in t Lk+1=candidates in Ck+1 with min_support End Return UkLk;
  • 45.
    Limitation of AprioriAlgorithm: i) When the size of the database is very large, the Apriori algorithm will fail. Because large database will not fit with memory (RAM). So each pass requires large number of disk reads. ii) Requires up to m database scans. iii) Any frequent item set with respect to a partition may or may not be frequent with respect to the entire Database D. iv) It is costly to handle a huge number of candidate sets.
  • 46.
    Frequent Pattern Tree(FP Tree of FP growth Algorithm): * An efficient and scalable method to complete set of frequent patterns. * It allows frequent item set discovery without candidate item set generation. Two step approach: * Build a compact data structure called the FP-Tree. * Extracts frequent item set directly from the FP-Tree.
  • 47.
    Steps of FrequentPattern Tree : Step 1: Calculate Minimum Support. Step 2: Find frequent of occurrence. Step 3: Prioritize the items.
  • 48.
    Step 4: Orderthe items according to priority: Draw FP Growth Tree:
  • 49.
    Mining Various kindsof Association Rules: 1. Generalised Association rules – these rules make use of a concept of hierarchy and same like regular association rule. 2. Multi-Level Association rules – strong association rules discovered when data are mined at multiple levels of abstraction. i) By using uniform minimum support for all levels. ii) Using reduced mini. Support at lower levels. iii) Using item or group-based mini. Support. 3. Multidimensional Association rules – more than one dimensions. It has inter-dimensional & Hybrid-dimensional rules.
  • 50.
    Advantages of AssociationRules: 1. Results are clearly understood. 2. Strong for undirected data mining. 3. Works on variable length data. 4. Computationally simple. Disadvantages of Association Rules: 1. Exponential growth as problem size increases. 2. Limited support for data attributes. 3. Determining the right items. 4. Association rule analysis has trouble with rare items.
  • 51.
    Introduction:  Classification isa data mining (machine learning) technique used to predict group membership for data instances. (E.g.) anyone can use classification to predict whether the weather on a particular day will be sunny, rainy or cloud.  This data analysis helps us to provide a better understanding of large data. Classification predicts categorical (discrete, unordered) and prediction models predict continuous valued functions. (E.g.) A bank loan officer wants to analyse the data in order to know which customer (loan applicant) are risky or which are safe.  A marketing manager at a company needs to analyse to guess a customer with a given profile will buy a new computer. CLASSIFICATION TECHNIQUES
  • 52.
    Data classification isa two-step process – Step 1: Classifier is built describing a predetermined set of data classes or concepts.  Each tuple / sample is assumed to belong to a predefined class as determined by another database attribute called the class label attribute.  Class label attribute is categorical that each value serves as category or class.  Class label of each training tuple is provided, this step is also known as supervised learning (i.e.) the learning of the classifier is “supervised” in that it is told to which class each training tuple belongs). Working of CLASSIFICATION TECHNIQUES
  • 54.
    Step 2:  Inthis step, the predictive accuracy of classifier is estimated and use the training set of measure the accuracy of the classifier, this estimate likely to be optimistic, because classifier tends to over fit the data.  A test set is used, made up of tuples and their associated class labels and randomly selected from the general data set.  The associated class label of each test tuple is compared with the learned classifier’s class prediction for that tuple.  (E.g.) Classification rules from the analysis of data from previous loan applications can be used to approve or reject new or future loan applicants. Working of CLASSIFICATION TECHNIQUES
  • 56.
     Decision treestructures are a common way to organize classification schemas. It visualizes what steps are taken to arrive at a classification.  Every decision tree begins with what is termed as root node, considered to be the parent of every other node.  Decision tree is a structure that includes root node, branch and leaf node. Each internal node denotes a test on attribute, each branch denotes the outcome of test and each leaf node holds the class label. The top most node in the tree is the root node.  Decision tree can represent diverse types of data especially numerical data.  Decision tree builds classification or regression models in the form of a tree structure and breaks down dataset into smaller and smaller subsets. CLASSIFICATION TECHNIQUES– Decision Tree
  • 57.
    Decision trees arebased on strategy “divide and conquer” and two types of divisions or partitions are – a) Nominal Partitions – a nominal attribute may lead to a split with as many branches as values there are for the attribute. b) Numerical Partitions – they allow partitions like X > a and X < a. Partitions relating two different attributes are not permitted. Method to build Decision Trees: The core algorithm for building decision trees called ID3 by J.R.Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking. It uses Entropy and Information Gain to construct a decision tree. CLASSIFICATION TECHNIQUES– Decision Tree
  • 58.
    a) Entropy: Adecision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogeneous). i) Entropy using frequency table of one attribute: Entropy calculation using frequency table of one attribute is shown below: ii) Entropy using frequency table of two attributes: Entropy calculation using frequency table of two attribute is shown below: CLASSIFICATION TECHNIQUES– Decision Tree Play Golf Yes No 9 5 Play Golf Yes No Total Outlook Sunny 3 2 5 Overcast 4 0 4 Rainy 2 3 5
  • 59.
    b) Information Gain:The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e.) the homogeneous branches) Step 1: Calculate entropy of the target. Step 2: The dataset is split on the different attributes. The entropy of each branch is calculate and added proportionally to get total entropy of split and result entropy is subtracted from entropy before the split, that is Information Gain. Step 3: Choose attribute with largest information gain as the decision node. Step 4A: A branch with entropy of 0 is a leaf node. Step 4B: A branch with entropy more than 0 needs further splitting. CLASSIFICATION TECHNIQUES– Decision Tree
  • 60.
    Advantages:  Simple tounderstand and interpret.  Requires little data preparation.  Able to handle both numerical and categorical data.  Use a white box model. (i.e.) given situation is observable based on the conditions as per Boolean logic.  Possible to validate a model using statistical tests.  Robust, perform well with large data in a short time. Disadvantages:  The reliability of the information in the decision tree depends on feeding the precise internal and external information at the onset.  Major decision tree disadvantages are its complexity, as it takes time- consuming. CLASSIFICATION TECHNIQUES– Decision Tree
  • 61.
    Bayesian Classification:  Bayesianclassification represents a supervised learning method as well as a statistical method for classification.  This classification is named after Thomas Bayes (1702 – 1761) who proposed Bayes Theorem.  It able to predict class membership probabilities such as the probability that a given tuple belongs to a particular class.  It provides practical algorithms and prior knowledge and observed data can be combined. CLASSIFICATION TECHNIQUES– Bayesian Classification
  • 62.
    Using the BayesTheorem for Classification:  Before describing how the Bayes theorem can be used for classification let formalise the classification problem from a statistical perspective. During the training phase, one needs to learn the posterior probabilities for every combination of X and Y based on information gathered from the training data.  To classify the record, we need to compute the posterior probabilities based on information available in the training data.  Estimating the posterior probabilities accurately for every possible combination of class label and attribute value is a difficult problem because it requires a very large training set, even for moderate number of attributes. CLASSIFICATION TECHNIQUES– Bayesian Classification
  • 63.
    Naïve Bayes Classifier: NaïveBayesian Classifier is based on Baye’s theorem with independence assumptions between predictors. A Naïve Bayesian model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets. A Naïve Bayes classifier estimates the class-conditional probability by assuming that the attributes are conditionally independent, given class label y. P (X l Y = y = 𝝥 P (Xi l Y = y) Where each attribute set X = {X1, X2, ……. Xd} consists of d attributes. CLASSIFICATION TECHNIQUES– Bayesian Classification
  • 64.
    Working of NaïveBayes Classifier: With the conditional independence assumption, instead of computing the class-conditional probability for every combination of X, one only have to estimate the conditional probability of each Xi, given Y. The latter approach is more practical because it does not require a very large training set to obtain a good estimate of the probability. To classify a test record, the Naïve Bayes classifier computes the posterior probability for each class Y: P (Y l X) = P (Y) 𝝥 P (Xi l Y) / P (X) Since P (X) is fixed for every Y, it is sufficient to choose the class that maximises the numerator term P (Y) 𝝥 P (Xi l Y). CLASSIFICATION TECHNIQUES– Bayesian Classification
  • 65.
    (E.g.): Consider the dataset shown in below table. Once can compute the class- conditional probability for each categorical attribute, along with the sample mean and variance for continuous attribute using the methodology. CLASSIFICATION TECHNIQUES– Bayesian Classification Tid Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorce d 95K Yes 6 No Married 60K No 7 Yes Divorce d 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes P (Home Owner=Yes/No) =3/7 P (Home Owner=No/No) =4/7 P (Home Owner=Yes/No) =0 P (Home Owner=No/Yes) =1 P (Marital Status=Single/No) =2/7 P (Marital Status=Divorced/No) =1/7 P (Marital Status=Married/No) =4/7 P (Marital Status=Single/Yes) =2/3 P (Marital Status=Divorced/Yes) =1/3 P (Marital Status=Married/Yes) =0
  • 66.
    Characteristics:  They arerobust to isolated noise points because such points are averaged out when estimating conditional probabilities from data.  It also handles missing values by ignoring the example during model building and classification.  They are robust to irrelevant attributes. The class conditional probability has no impact on overall computation of the posterior probability.  Correlated attributes can degrade the performance of naïve Bayes classifiers because the conditional independence assumption no longer holds for such attributes. CLASSIFICATION TECHNIQUES– Bayesian Classification
  • 67.
    Uses of NaïveBayes Classification: 1. Naïve Bayes Text Classification: It is used as a probabilistic learning method and it is most successful known algorithms for learning to classify text documents. 2. Spam filtering: It is the best known use of Naïve Bayesian text classification. It makes use of naïve Bayes classifier to identify spam e-mail. 3. Hybrid Recommender System using Naïve Bayes Classifier and Collaborative Filtering: It apply machine learning a data mining techniques for filtering unseen information and can predict whether a user would like a given resource. 4. Online Application: This online application has been set up as a simple example of supervised machine learning and affective computing. It employs single words and word pairs as features. CLASSIFICATION TECHNIQUES– Bayesian Classification
  • 68.
    K-nearest neighbour:  k-nearestneighbour classifier is one of the examples of lazy learner which is used in the area of pattern recognition.  It assumes the training set constitutes the entire data of the set as well as the desired classification for each item. Thus, training data become the model for the future data.  k-nearest neighbour classifier learns by comparing a given test tuple with the training tuples that are similar to it.  The similarity or closeness between the two tuples is determined in terms of a distance metric such as Euclidean distance.  It basically stores all of the training tuples in an n-dimensional pattern space, where each tuple represents a point in this space. CLASSIFICATION TECHNIQUES – k-nearest neighbours
  • 69.
    K-nearest neighbour:  Itsearches the pattern space for k training tuples that are closest or similar to unknown tuple.  If the attributes are of categorical type, the distance is computed by comparing the corresponding value of the attribute in tuple X1 and tuple X2.  k-nearest neighbours are considered as good predictors, robust to outliers and with the capability of handling the missing values. CLASSIFICATION TECHNIQUES – k-nearest neighbours
  • 70.
    Algorithm of k-nearestneighbour classification: A high level summary of the nearest neighbour classification methods is given as – Step 1: Let k be the number of nearest neighbours and D be the set of training examples. Step 2: For each test example Z = (x’. y’) do. Step 3: Compute d (x’, x), the distance between z and every example, (x, y) € D. Step 4: Select Dz ⸦ D, the set of k closest training examples to z. Step 5: y’ = arg. Max. ∑(x,y) € D I (v = yi) Step 6: end for CLASSIFICATION TECHNIQUES – k-nearest neighbours
  • 71.
    Characteristics of k-nearestneighbour:  Nearest neighbour classification is a part of a more general techniques known as instance-based learning.  Lazy learners such as nearest neighbour classifiers don’t require model building.  It makes their predictions based on local information, whereas decision tree and rule-based classifiers attempt to find a global model that fits the entire input space.  It can produce arbitrarily shaped decision boundaries, which provide a more flexible model representation compared to decision tree and rule- based classifiers that are often constrained to rectilinear decision boundaries. CLASSIFICATION TECHNIQUES – k-nearest neighbours
  • 72.
    Association Classification:  AssociativeClassification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers).  This training phase is about searching for hidden knowledge primarily using association rule algorithms and then a classification model is constructed after sorting the knowledge in regards to certain criteria.  AC is considered a special case of the association rule where the target attribute is considered in the rule’s right hand side.  Associative classification depends mainly on two important thresholds called minimum support and minimum confidence. CLASSIFICATION TECHNIQUES– Classification algorithm
  • 73.
    Associative Classification Algorithm: AnAC algorithm operates in three main phases – Phase 1: AC searches for hidden correlations between the attribute values and class attribute values in the training data set. The rules Class Association Rule (CARs) are generated from them in ‘if-then’ format. Phase 2: In this phase, ranking and pruning procedures start process, at this stage, CARs ranked according to a certain number of parameters like confidence and support values to ensure that rules with high confidence Phase 3: Lastly, the classification model is utilized to predict the class values on new unseen data set (test data). CLASSIFICATION TECHNIQUES– Classification algorithm
  • 74.
    Methods of AssociativeClassification Algorithm: 1. Classification based on Associations (CBA). 2. Classification based on Multiple Association Rules (CMAR). 3. Classification based on Predictive Association Rules (CPAR). CLASSIFICATION TECHNIQUES– Classification algorithm