Unit 4.pptx

Unit 4
Association Rule Mining &
Clustering

Association Rule mining
 Association Rule mining is “what goes
with what”
 Association rule mining is a technique to
identify underlying relations between
different items.
 Given a set of transactions, find rules
that will predict the occurrence of an
item based on the occurrences of other
items in the transactions.
 The process of identifying an
associations between products is

 More profit can be generated if the
relationship between the items purchased in
different transactions can be identified.
 For instance, if item A and B are bought together
more frequently then several steps can be taken
to increase the profit. For example,
 A and B can be placed together so that when a
customer buys one of the product he doesn't have
to go far away to buy the other product.
 People who buy one of the products can be
targeted through an advertisement campaign to buy
the other.
 Collective discounts can be offered on these
products if the customer buys both of them.
 Both A and B can be packaged together.

 Applications
Market basket analysis
Cross-marketing
Catalog design etc..

Association Rules
 Association rule has to
be interpreted in the
form of “if-then”
statements
 Association rules are
probabilistic in nature
 Some possible
association rules are
 {Bread} -> {Eggs}
 {Bread,Cereal} ->
{Eggs}
 Collection of one or
more items is called

 {Bread, Cereal} -> {Eggs}
X => Y
If – Then
Antecedent - Consequent

 The possible associations can be many.
We may be interested in finding the strong
associations.
 But how to find strong associations ?
 Answer: Support ,Confidence & Lift.
 Support and Confidence are the
measures to confirm the rule as a strong
association rule.
 These two measures express the degree
of uncertainty about the rule.
 The antecedent and consequent must be
disjoint sets

Theory of Apriori Algorithm
 There are three major components of
Apriori algorithm:
 Support (prevalance/popularity)
Confidence (predictability) – likely
purchase of consequent
Lift (interest)- association expect by
chance

Three key terms to determine
rules
Lift =1 means there is no association between products A
and B.
Lift > 1 means products A and B are more likely to be
bought together.
Support(X)= freq(X)/N
Support(Y)=freq(Y)/N
Some Algorithms take
the support and some
take the support count

Steps in Apriori algorithm
 Step 1: Generate all frequent item sets
 Candidate Generation is the possible
combination of the itemset performed by the
join operation
 A frequent itemset is
an itemset appearing in (frequent) at
least minimum support transactions from
the transaction database
 Eg: {A},{B},{C} –Freq. Itemset at k=1
 {A,B} {A,C} {B,C} – candidate Generation at
k=2
 Step 2: Generate strong association rules

Step1: Finding the frequent
itemset
 Let k=1
 Generate frequent item sets of length 1
 Repeat until no new frequent item sets are
identified
 Create a candidate list of k itemsets by
performing join operation on pairs of (k-1)
itemsets in the list.
 Prune candidate item sets containing subsets of
length k that are infrequent
 Count the support of each candidate by
scanning the DB
 Eliminate candidates that are infrequent,
leaving the list with only those that are

Step2: Generate Strong Rules
 Formulate all the possible combination of the
frequent itemset
 Calculate the confidence
 Choose the rule which has more confidence
 Also calculate the lift and check whether rule has
lift >1

 To speed up the process,
 Set a minimum value for support and confidence.
This means that we are only interested in finding
rules for the items that have certain default
existence (e.g. support) and have a minimum value
for co-occurrence with other items (e.g. confidence).
 Extract all the subsets having higher value of
support than minimum threshold.
 Select all the rules from the subsets with confidence
value higher than minimum threshold.
 Order the rules by descending order of Lift.

Advantage
 Subset of a frequent itemset is also a frequent
itemset.
 This reduce the number of candidates being
considered by only exploring the itemsets whose
support count is greater than the minimum
support count.
 All infrequent itemsets can be pruned if it has an
infrequent subset.

Some other ARM Algorithms
 FP Growth
 AIS
 SETM
 ECLAT

Practice Problem
 Find the Support,
Confidence and Lift
for the rule
{Apples,Milk}-
>{Cheese}
 Apply Apriori
algorithm to find the
frequent item set

Cluster Analysis
 Clustering is a Process of grouping objects
which are similar
 Clustering is an unsupervised learning
technique
 Objects of a cluster are similar and objects of
different cluster are dissimilar
 The objects can be grouped based on
attributes/features or by relationships with
other objects (distance or similarity)
 Clustering does not require assumptions
about category labels that tag objects with
prior identifiers
 Clustering is subjective (or problem
dependent) and can summarize data to a

 Applications
 Customer relationship management
 Information retrieval
 Data compression
 Image processing
 Marketing
 Medicine
 Pattern recognition

Similarity Measurement
 Grouping is done based on the closeness or
similarity
 One way of doing this, is measure the
distance
 Distance Measurement Methods
 Euclidean Distance
 Manhattan Distance
 Chebychev Distance
 Percentage Disagreement

Euclidean Distance
 Calculate the distance on the raw
data
Distance(Oi,Oj)= sqrt (∑(Oik – Ojk)2
Distance (O1,O2) = sqrt((5-8)2+(6-9)2+(4-3)2+(9-2)2
= 8.25
Distance (O1,O3) = sqrt((5-3)2+(6-4)2+(4-5)2+(9-3)2
= 6.7
2 2 2 2
Object X1 X2 X3 X4
O1 5 6 4 9
O2 8 9 3 2
O3 3 4 5 3

Manhattan Distance
 Simply the average difference across dimensions
 Distance(Oi,Oj)= 1/n (∑(|Oik – Ojk|)
 n represents the number of features
Distance (O1,O2) = 1/4(|5-8|+|6-9|+|4-3|+|9-2| = 14/4 = 3.5
Distance (O1,O3) = 1/4(|5-3|+|6-4|+|4-5|+|9-3| = 2.75
Distance (O2,O3) = 1/4(|8-3|+|9-4|+|3-5|+|2-3| = 3.25
Object X1 X2 X3 X4
O1 5 6 4 9
O2 8 9 3 2
O3 3 4 5 3

Chebychev Distance
 Max difference across dimensions
 Distance(Oi,Oj)= Max(|Oik – Ojk|)
Distance (O1,O2) = Max(|5-8|,|6-9|,|4-3|,|9-2| =7
Distance (O1,O3) = Max(|5-3|,|6-4|,|4-5|,|9-3| = 6
Distance (O2,O3) = Max(|8-3|,|9-4|,|3-5|,|2-3| = 5
Object X1 X2 X3 X4
O1 5 6 4 9
O2 8 9 3 2
O3 3 4 5 3

Percent Disagreement
 Suited for features which is categorical in nature
 Distance(Oi,Oj)= 100* [ Number of (Oik <>Ojk )] div n
 N represents the number of features
Distance (O1,O2) = 100*(1 div 4) =25%
Distance (O1,O3) = 100*(2 div 4)= 50%
Distance (O2,O3) = 100*(3 div 4 )= 75%
Object Gender Age
bracket
Income
level
BP
O1 M 20-30 Low Normal
O2 M 30-40 Low Normal
O3 F 20-30 Medium Normal

Types of Clustering
 Partitional – we construct various partitions and
then evaluate them by some criteria
 K-means
 K- medoids
 Hierarchical – we create hierarchical
decomposition of the set of objects using some
criterion
 Bottom up – agglomerative
 Initially, each point is a cluster
 Repeatedly combine the two nearest clusters into one
 Top-down – divisive
 Start with one cluster and recursively split it

K-means
Step 1: Choose k objects arbitrarily from D as in
initial cluster centers
Step 2 : Repeat
Step 3: Calculate the distance between each other
data point
Step 4: Assign each object to the cluster of
nearest center
Step 5: Update the cluster means
Step 6: Until no change

Example
 As a simple illustration of a k-means algorithm,
consider the following data set consisting of the
scores of two variables on each of Five
individuals. This data set is to be grouped into two
clusters.
Subject X1 X2
A 1 1
B 1 0
C 0 2
D 2 4
E 3 5

 Choose the cluster centroids
Individual
Mean Vector
(centroid)
Group 1 A (1, 1)
Group 2 D (2, 4)

 Calculate the distance between each objects
using euclidean/manhattan/chebychev distance
measure
Subject A B C D E
A 0
B 1 0
C 1 2 0
D 3 4 2 0
E 4 5 3 1 0

 Calculate the distance (using euclidean/
manhattan/ chebychev) of each individual to the
chosen centroid. Assign 1 to the min distance of
the cluster. Eg: for obj A min(0,3) = 0 , so put 1 in
cluster A. Rearrange the cluster and reassign the
centroid.
Object
Cluster 1
(1,1) A
Cluster 2
(2,4) D
A 0 3
B 1 4
C 1 2
D 3 0
E 4 1
Object
Cluster 1
(1,1)
Cluster 2
(2,4)
A 1 0
B 1 0
C 1 0
D 0 1
E 0 1
New centroid,
Cluster 1 (1+1+0/3 , 1+0+2/3) = (2/3,3/3) = (0.6,1)
Cluster 2 ( 2+3/2 , 4+5/2) = (5/2,9/2)=(2.5,4.5)

 Repeat until no change in the centroids
Object
Cluster 1
(0.6,1)
Cluster 2
(2.5,4.5)
A 0.4 3.5
B 1 4.5
C 1 2.5
D 3 0.5
E 4 0.5
Object
Cluster 1
(0.6,1)
Cluster 2
(2.5,4.5)
A 1 0
B 1 0
C 1 0
D 0 1
E 0 1
New centroid,
Cluster 1 (1+1+0/3 , 1+0+2/3) = (2/3,3/3) = (0.6,1)
Cluster 2 ( 2+3/2 , 4+5/2) = (5/2,9/2)=(2.5,4.5)

Stopping Criteria for KMeans
 The datapoints assigned to specific cluster
remain the same
 Centroids remain the same
 The distance of datapoints from their centroid is
minimum
 Fixed number of iterations have reached
(insufficient iterations → poor results, choose max
iteration wisely)

Model Metrics
 Within-cluster sum of squares
 The sum of the squared deviations from each
observation and the cluster centroid.
 small sum of squares is more compact
 Between-Cluster sum of squares
 measures the squared average
distance between all centroids.
 Larger implies the clusters are well separated

Practice Problem
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5

k-Medoids
 K-means is sensitive to outliers
 k-medoids – instead of taking the mean value of the
object in the cluster as a reference point, medoids can
be used which is more centrally located object in a
cluster
 The k-medoids clustering algorithm:
 Select k points as the initial representative of the objects
 Repeat
 Assigning each point closest to the medoid
 Randomly select a non-representative object Oi
 Compute the total cost S of swapping the medoid m
with Oi
 If S<0, then swap m with Oi to form the new set of
medoids

Agglomerative clustering
 It is Hierarchical clustering method –
specifically it uses bottom-up approach
 Idea: Ensure nearby points ends up in the
same cluster
 Start with a collection of n singleton
clusters
Each cluster contains one data point
 Repeatedly only one cluster is left:
Find a pair of clusters that is closest: min
D(ci,cj)
Merge the clusters ci, cj into a new cluster
cij

Example
 For a given dataset, Form the distance matrix

 Merge col 3 and 5 . For example, d(1,3)= 3 and
d(1,5)=11. So, D(1,"35")=11. This gives us the
new distance matrix. The items with the smallest
distance get clustered next. This will be 2 and 4.
35 24 1
35 0
24 10 0
1 11 9 0

Modern Clustering methods
 Hierarchical clustering
 BIRCH – Balanced Iterative reducing and clustering
using hierarchies
 CURE – clustering using Representation
 ROCK – Robust clustering for categorical data
 Partitive clustering
 CLARA – Clustering Large Application
 CLARANS – clustering large application on
randomized search
 K-mode

Other Clustering methods
 Density based clustering
 DENCLUE – Density based clustering
 DBSCAN- Density based spatial clustering of
application with noise
 Optics – ordering points to identify the clustering
structure
 Grid based methods
 STING – statistical information Grid based method
 Wave cluster
 Model based methods
 COBWEB
 CLASSIT

Unit 4.pptx

Recommended

Recommended

More Related Content

Similar to Unit 4.pptx

Similar to Unit 4.pptx (20)

Recently uploaded

Recently uploaded (20)

Unit 4.pptx