SlideShare a Scribd company logo
1 of 150
Data Mining
Association Analysis: Basic Concepts
and Algorithms
Lecture Notes for Chapter 6
Introduction to Data Mining
by
Tan, Steinbach, Kumar
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Association Rule MiningGiven a set of transactions, find rules
that will predict the occurrence of an item based on the
occurrences of other items in the transaction
Market-Basket transactions
Example of Association Rules
{Beer, Bread
Implication means co-occurrence, not causality!
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
TIDItems
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Definition: Frequent ItemsetItemsetA collection of one or more
itemsExample: {Milk, Bread, Diaper}k-itemsetAn itemset that
of transactions that contain an itemsetE.g. s({Milk, Bread,
Diaper}) = 2/5Frequent ItemsetAn itemset whose support is
greater than or equal to a minsup threshold
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
TIDItems
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Definition: Association RuleAssociation RuleAn implication
itemsetsExample:
Rule Evaluation MetricsSupport (s)Fraction of transactions that
contain both X and YConfidence (c)Measures how often items
in Y
appear in transactions that
contain X
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
TIDItems
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Association Rule Mining TaskGiven a set of transactions T, the
goal of association rule mining is to find all rules having
support ≥ minsup thresholdconfidence ≥ minconf threshold
Brute-force approach:List all possible association rulesCompute
the support and confidence for each rulePrune rules that fail the
minsup and minconf thresholds
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Mining Association Rules
Example of Rules:
} (s=0.4, c=0.5)
Observations: All the above rules are binary partitions of the
same itemset:
{Milk, Diaper, Beer} Rules originating from the same
itemset have identical support but
can have different confidence Thus, we may decouple the
support and confidence requirements
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
TIDItems
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Mining Association RulesTwo-step approach:
Frequent Itemset Generation
Rule Generation
Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent i temset
Frequent itemset generation is still computationally expensive
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Frequent Itemset Generation
Given d items, there are 2d possible candidate itemsets
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Frequent Itemset GenerationBrute-force approach: Each itemset
in the lattice is a candidate frequent itemsetCount the support of
each candidate by scanning the database
Match each transaction against every candidateComplexity ~
O(NMw) => Expensive since M = 2d !!!
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
N�
w�
M�
List of Candidates�
Computational ComplexityGiven d unique items:Total number
of itemsets = 2dTotal number of possible association rules:
If d=6, R = 602 rules
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Frequent Itemset Generation StrategiesReduce the number of
candidates (M)Complete search: M=2dUse pruning techniques
to reduce M
Reduce the number of transactions (N)Reduce size of N as the
size of itemset increasesUsed by DHP and vertical-based mining
algorithms
Reduce the number of comparisons (NM)Use efficient data
structures to store the candidates or transactionsNo need to
match every candidate against every transaction
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Reducing Number of CandidatesApriori principle:If an itemset
is frequent, then all of its subsets must also be frequent
Apriori principle holds due to the following property of the
support measure:
Support of an itemset never exceeds the support of its
subsetsThis is known as the anti-monotone property of support
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Illustrating Apriori Principle
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Illustrating Apriori Principle
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,
6C1 + 6C2 + 6C3 = 41
With support-based pruning,
6 + 6 + 1 = 13
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
ItemCount
Bread
4
Coke
2
Milk
4
Beer
3
Diaper
4
Eggs
1
ItemsetCount
{Bread,Milk}
3
{Bread,Beer}
2
{Bread,Diaper}
3
{Milk,Beer}
2
{Milk,Diaper}
3
{Beer,Diaper}
3
ItemsetCount
{Bread,Milk,Diaper}
3
Apriori AlgorithmMethod:
Let k=1Generate frequent itemsets of length 1Repeat until no
new frequent itemsets are identifiedGenerate length (k+1)
candidate itemsets from length k frequent itemsetsPrune
candidate itemsets containing subsets of length k that are
infrequent Count the support of each candidate by scanning the
DBEliminate candidates that are infrequent, leaving only those
that are frequent
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Reducing Number of ComparisonsCandidate counting:Scan the
database of transactions to determine the support of each
candidate itemsetTo reduce the number of comparisons, store
the candidates in a hash structure Instead of matching each
transaction against every candidate, match it against candidates
contained in the hashed buckets
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
N�
k�
Buckets�
Hash Structure�
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2
3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6
8}
You need: Hash function Max leaf size: max number of
itemsets stored in a leaf node (if number of candidate i temsets
exceeds max leaf size, split the node)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 1, 4 or 7
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 2, 5 or 8
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 3, 6 or 9
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Subset Operation
Given a transaction t, what are the possible subsets of size 3?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
1 2 3 5 6�
Transaction, t�
2 3 5 6�
3 5 6�
2�
1�
5 6�
1 3�
3 5 6�
1 2�
6�
1 5�
5 6�
2 3�
6�
2 5�
5 6�
3�
1 2 3�1 2 5�1 2 6�
1 3 5�1 3 6�
1 5 6�
2 3 5�2 3 6�
2 5 6�
3 5 6�
Subsets of 3 items�
Level 1�
Level 2�
Level 3�
6�
3 5�
Subset Operation Using Hash Tree
transaction
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Subset Operation Using Hash Tree
1 5 9
1 3 6
3 4 5
transaction
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Subset Operation Using Hash Tree
1 5 9
1 3 6
3 4 5
transaction
Match transaction against 11 out of 15 candidates
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Factors Affecting ComplexityChoice of minimum support
threshold lowering support threshold results in more frequent
itemsets this may increase number of candidates and max length
of frequent itemsetsDimensionality (number of items) of the
data set more space is needed to store support count of each
item if number of frequent items also increases, both
computation and I/O costs may also increaseSize of database
since Apriori makes multiple passes, run time of algorithm may
increase with number of transactionsAverage transaction width
transaction width increases with denser data setsThis may
increase max length of frequent itemsets and traversals of hash
tree (number of subsets in a transaction increases with its
width)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Compact Representation of Frequent ItemsetsSome itemsets are
redundant because they have identical support as their supersets
Number of frequent itemsets
Need a compact representation
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Maximal Frequent Itemset
Border
Infrequent Itemsets
Maximal Itemsets
An itemset is maximal frequent if none of its immediate
supersets is frequent
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
null�
AB�
AC�
AD�
AE�
BC�
BD�
BE�
CD�
CE�
DE�
ABC�
ABD�
ABE�
ACD�
ACE�
ADE�
BCD�
BCE�
BDE�
CDE�
A�
B�
C�
D�
E�
ABCD�
ABCE�
ABDE�
ACDE�
BCDE�
ABCDE�
Closed ItemsetAn itemset is closed if none of its immediate
supersets has the same support as the itemset
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1TIDItems1{A,B}2{B,C,D}3{A,B,C,D}4{A,B,D}5{A,B,C
,D}
Sheet2
Sheet3
Sheet1ItemsetSupport{A}4{B}5{C}3{D}4{A,B}4{A,C}2{A,D}
3{B,C}3{B,D}4{C,D}3
Sheet2
Sheet3
Sheet1ItemsetSupport{A,B,C}2{A,B,D}3{A,C,D}2{B,C,D}3{A
,B,C,D}2
Sheet2
Sheet3
Maximal vs Closed Itemsets
Transaction Ids
Not supported by any transactions
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1TIDItems1ABC2ABCD3BCE4ACDE5DE
Maximal vs Closed Frequent Itemsets
Minimum support = 2
# Closed = 9
# Maximal = 4
Closed and maximal
Closed but not maximal
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Maximal vs Closed Itemsets
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Frequent Itemsets�
Closed Frequent Itemsets�
Maximal Frequent Itemsets�
Alternative Methods for Frequent Itemset GenerationTraversal
of Itemset LatticeGeneral-to-specific vs Specific-to-general
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
....�
Frequent itemset border�
null�
{a1,a2,...,an}�
(a) General-to-specific�
....�
null�
{a1,a2,...,an}�
Frequent itemset border�
(b) Specific-to-general�
....�
Frequent itemset border�
null�
{a1,a2,...,an}�
(c) Bidirectional�
Alternative Methods for Frequent Itemset GenerationTraversal
of Itemset LatticeEquivalent Classes
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
null�
AB�
AC�
AD�
null�
BC�
BD�
AB�
CD�
AC�
AD�
ABC�
ABD�
BC�
ACD�
BD�
CD�
BCD�
A�
B�
C�
A�
B�
C�
D�
D�
ABC�
ABD�
ACD�
BCD�
ABCD�
(a) Prefix tree�
(b) Suffix tree�
ABCD�
Alternative Methods for Frequent Itemset GenerationTraversal
of Itemset LatticeBreadth-first vs Depth-first
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
(a) Breadth first�
(b) Depth first�
Alternative Methods for Frequent Itemset
GenerationRepresentation of Databasehorizontal vs vertical data
layout
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Horizontal Data Layout�
Vertical Data Layout�
FP-growth AlgorithmUse a compressed representation of the
database using an FP-tree
Once an FP-tree has been constructed, it uses a recursive
divide-and-conquer approach to mine the frequent itemsets
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
FP-tree construction
null
A:1
B:1
null
A:1
B:1
B:1
C:1
D:1
After reading TID=1:
After reading TID=2:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1TIDItems1{A,B}2{B,C,D}3{A,C,D,E}4{A,D,E}5{A,B,C
}6{A,B,C,D}7{B,C}8{A,B,C}9{A,B,D}10{B,C,E}
Sheet2
Sheet3
FP-Tree Construction
null
A:7
B:5
B:3
C:3
D:1
C:1
D:1
C:3
D:1
D:1
E:1
E:1
Pointers are used to assist frequent itemset generati on
D:1
E:1
Transaction Database
Header table
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1TIDItems1{A,B}2{B,C,D}3{A,C,D,E}4{A,D,E}5{A,B,C
}6{A,B,C,D}7{B,C}8{A,B,C}9{A,B,D}10{B,C,E}
Sheet2
Sheet3
Sheet1ItemPointerABCDE
Sheet2
Sheet3
FP-growth
null
A:7
B:5
B:1
C:1
D:1
C:1
D:1
C:3
D:1
D:1
Conditional Pattern base for D:
P = {(A:1,B:1,C:1),
(A:1,B:1),
(A:1,C:1),
(A:1),
(B:1,C:1)}
Recursively apply FP-growth on P
Frequent Itemsets found (with sup > 1):
AD, BD, CD, ACD, BCD
D:1
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Tree Projection
Set enumeration tree:
Possible Extension: E(A) = {B,C,D,E}
Possible Extension: E(ABC) = {D,E}
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Tree ProjectionItems are listed in lexicographic orderEach node
P stores the following information:Itemset for node PList of
possible lexicographic extensions of P: E(P)Pointer to projected
database of its ancestor nodeBitvec tor containing information
about which transactions in the projected database contain the
itemset
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Projected Database
Original Database:
Projected Database for node A:
For each transaction T,
E(A)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1TIDItems1{A,B}2{B,C,D}3{A,C,D,E}4{A,D,E}5{A,B,C
}6{A,B,C,D}7{B,C}8{A,B,C}9{A,B,D}10{B,C,E}
Sheet2
Sheet3
Sheet1TIDItems1{B}2{}3{C,D,E}4{D,E}5{B, C}6{B,C,D}7{}8
{B,C}9{B,D}10{}
Sheet2
Sheet3
ECLATFor each item, store a list of transaction ids (tids)
TID-list
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
ECLATDetermine support of any k-itemset by intersecting tid-
lists of two of its (k-1) subsets.
3 traversal approaches: top-down, bottom-up and
hybridAdvantage: very fast support countingDisadvantage:
intermediate tid-lists may become too large for memory
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1TIDItemsABCDE1A,B,E112212B,C,D423433C,E554564
A,C,D67895A,B,C,D7896A,E8107A,B98A,B,C9A,C,D10BAB11
425567788109
Sheet2
Sheet3
Sheet1TIDItemsABCDE1A,B,E112212B,C,D423433C,E554564
A,C,D67895A,B,C,D7896A,E8107A,B98A,B,C9A,C,D10BAB11
425567788109
Sheet2
Sheet3
Sheet1TIDItemsABCDE1A,B,E112212B,C,D423433C,E554564
A,C,D67895A,B,C,D7896A,E8107A,B98A,B,C9A,C,D10BABA
B111425557678788109
Sheet2
Sheet3
Rule GenerationGiven a frequent itemset L, find all non-empty
– f satisfies the minimum
confidence requirementIf {A,B,C,D} is a frequent itemset,
candidate rules:
If |L| = k, then there are 2k – 2 candidate association rules
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Rule GenerationHow to efficiently generate rules from frequent
itemsets?In general, confidence does not have an anti-monotone
property
But confidence of rules generated from the same itemset has an
anti-monotone propertye.g., L = {A,B,C,D}:
Confidence is anti-monotone w.r.t. number of items on the RHS
of the rule
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Rule Generation for Apriori Algorithm
Lattice of rules
Low Confidence Rule
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
ABCD=>{ }�
BC=>AD�
BD=>AC�
CD=>AB�
AD=>BC�
AC=>BD�
AB=>CD�
D=>ABC�
C=>ABD�
B=>ACD�
A=>BCD�
ACD=>B�
ABD=>C�
ABC=>D�
BCD=>A�
Rule Generation for Apriori AlgorithmCandidate rule is
generated by merging two rules that share the same prefix
in the rule consequent
join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC
Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Effect of Support DistributionMany real data sets have skewed
support distribution
Support distribution of a retail data set
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Effect of Support DistributionHow to set the appropriate minsup
threshold?If minsup is set too high, we could miss itemsets
involving interesting rare items (e.g., expensive products)
If minsup is set too low, it is computationally expensive and the
number of itemsets is very large
Using a single minimum support threshold may not be effective
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Multiple Minimum SupportHow to apply multiple minimum
supports?MS(i): minimum support for item i e.g.:
MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%MS({Milk,
Broccoli}) = min (MS(Milk), MS(Broccoli))
= 0.1%
Challenge: Support is no longer anti-monotone Suppose:
Support(Milk, Coke) = 1.5% and
Support(Milk, Coke, Broccoli) = 0.5%
{Milk,Coke} is infrequent but {Milk,Coke,Broccoli} is
frequent
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Multiple Minimum Support
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Multiple Minimum Support
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Multiple Minimum Support (Liu 1999)Order the items according
to their minimum support (in ascending order)e.g.:
MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%Ordering:
Broccoli, Salmon, Coke, Milk
Need to modify Apriori such that:L1 : set of frequent itemsF1 :
where MS(1) is mini( MS(i) )C2 : candidate itemsets
of size 2 is generated from F1
instead of L1
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Multiple Minimum Support (Liu 1999)Modifications to
Apriori:In traditional Apriori, A candidate (k+1)-itemset is
generated by merging two
frequent itemsets of size k The candidate is pruned if it
contains any infrequent subsets
of size kPruning step has to be modified: Prune only if subset
contains the first item e.g.: Candidate={Broccoli, Coke, Milk}
(ordered according to
minimum support) {Broccoli,
Coke} and {Broccoli, Milk} are frequent but
{Coke, Milk} is infrequent
Candidate is not pruned because {Coke,Milk} does not contain
the first item, i.e., Broccoli.
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Pattern EvaluationAssociation rule algorithms tend to produce
too many rules many of them are uninteresting or
have same support & confidence
Interestingness measures can be used to prune/rank the derived
patterns
In the original formulation of association rules, support &
confidence are the only measures used
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Application of Interestingness Measure
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
information needed to compute rule interestingness can be
obtained from a contingency table
Contingency table
Used to define various measures support, confidence, lift, Gini,
J-measure, etc.YY Xf11f10f1+X f01f00fo+f+1f+0|T|
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Drawback of Confidence
Coffee
CoffeeTea15520Tea755809010100
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Statistical IndependencePopulation of 1000 students600
students know how to swim (S)700 students know how to bike
(B)420 students know how to swim and bike (S,B)
Negatively correlated
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Statistical-based MeasuresMeasures that take into account
statistical dependence
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Example: Lift/Interest
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9 Lift = 0.75/0.9= 0.8333 (< 1, therefore is
negatively associated)
Coffee
CoffeeTea15520Tea755809010100
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Drawback of Lift & Interest
Statistical independence:
If P(X,Y)=P(X)P(Y) => Lift =
1YYX10010X090901090100YYX90090X010109010100
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
There are lots of measures proposed in the literature
Some measures are good for certain applications, but not for
others
What criteria should we use to determine whether a measure is
good or bad?
What about Apriori-style support based pruning? How does it
affect these measures?
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Properties of A Good MeasurePiatetsky-Shapiro:
3 properties a good measure M must satisfy:M(A,B) = 0 if A
and B are statistically independent
M(A,B) increase monotonically with P(A,B) when P(A) and
P(B) remain unchanged
M(A,B) decreases monotonically with P(A) [or P(B)] when
P(A,B) and P(B) [or P(A)] remain unchanged
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Comparing Different Measures
10 examples of contingency tables:
Rankings of contingency tables using various measures:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1ExampleE18123834241370100001.1581671343E2833026
221046100001.116800672E3948194127298100001.030581565E
43954308052961100001.4198707063E528861363132044311000
01.6148802655E6150020005006000100002.1428571429E74000
200010003000100001.3333333333E84000200020002000100001
.1111111111E91720712151154100001.127815235E1061248347
452100003.6889211418
Sheet2
Sheet3
Property under Variable Permutation
Does M(A,B) = M(B,A)?
Symmetric measures: support, lift, collective strength, cosine,
Jaccard, etc
Asymmetric measures: confidence, conviction, Laplace, J-
measure, etc
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Property under Row/Column Scaling
Grade-Gender Example (Mosteller, 1968):
Mosteller:
Underlying association should be independent of
the relative number of male and female students
in the samples
2x
10xMaleFemaleHigh235Low1453710MaleFemaleHigh43034Lo
w2404267076
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Property under Inversion Operation
Transaction 1
Transaction N
.
.
.
.
.
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
- -coefficient is analogous to
correlation coefficient for continuous variables
tablesYYX601070X1020307030100YYX201030X106070307010
0
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Property under Null Addition
Invariant measures: support, cosine, Jaccard, etc
Non-invariant measures: correlation, Gini, mutual information,
odds ratio, etc
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Different Measures have Different Properties
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Sheet1SymbolMeasureRangeP1P2P3O1O2O3O3'O4FCorrelation
-1 … 0 … 1YesYesYesYesNoYesYesNolLambda0 …
1YesNoNoYesNoNo*YesNoaOdds
ratioYes*YesYesYesYesYes*YesNoQYule's Q-1 … 0 …
1YesYesYesYesYesYesYesNoYYule's Y-1 … 0 …
1YesYesYesYesYesYesYesNokCohen's-1 … 0 …
1YesYesYesYesNoNoYesNoMMutual Information0 …
1YesYesYesYesNoNo*YesNoJJ-Measure0 …
1YesNoNoNoNoNoNoNoGGini Index0 …
1YesNoNoNoNoNo*YesNosSupport0 …
1NoYesNoYesNoNoNoNocConfidence0 …
1NoYesNoYesNoNoNoYesLLaplace0 …
1NoYesNoYesNoNoNoNoVConvictionNoYesNoYes**NoNoYes
NoIInterestYes*YesYesYesNoNoNoNoISIS (cosine)0 ..
1NoYesYesYesNoNoNoYesPSPiatetsky-Shapiro's-0.25 … 0 …
0.25YesYesYesYesNoYesYesNoFCertainty factor-1 … 0 …
1YesYesYesNoNoNoYesNoAVAdded value0.5 … 1 …
1YesYesYesNoNoNoNoNoSCollective
strengthNoYesYesYesNoYes*YesNozJaccard0 ..
1NoYesYesYesNoNoNoYesKKlosgen'sYesYesYesNoNoNoNoN
owhere:O1: Symmetry under variable permutationO2:
Marginal invarianceO3: Antisymmetry under row or column
permutationO3': Inversion invarianceO4: Null
invarianceYes*: Yes if measure is normalizedYes**: Yes if
measure is symmetrized by taking max(M(A,B),M(B,A))No*:
Symmetry under row or column permutation
Sheet2
Sheet3
3
3
2
0
3
1
3
2
1
3
2
K
K
÷
÷
ø
ö
ç
ç
è
æ
-
-
÷
÷
ø
ö
ç
ç
è
æ
-
MBD01E1AF4B.unknown
Support-based PruningMost of the association rule mining
algorithms use support measure to prune rules and itemsets
Study effect of support pruning on correlation of
itemsetsGenerate 10000 random contingency tablesCompute
support and pairwise correlation for each tableApply support-
based pruning and examine the tables that are removed
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Effect of Support-based Pruning
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Chart113951722823595036487168308679089158817596025323
702591948411
Correlation
All Itempairs
Chart20047972535428692755428141635000
Correlation
Support > 0.9
Chart372970951501822672963363833753703322722021618949
41174
Correlation
Support > 0.7
Chart413921642653274285425726406656646786455404063552
37139118515
Correlation
Support > 0.5
Chart510212218282846403325217210000000
Correlation
Support < 0.01
Chart61371686667849493937756301453000000
Correlation
Support < 0.03
Chart71391104106107128141151161136996745158421000
Correlation
Support < 0.05
Sheet1CorrelationAll> 0.9> 0.7> 0.5< 0.01< 0.03< 0.05-
1130713101313-0.99502992217191-0.81724701642268104-
0.72827952651866106-0.635991503272867107-
0.550371824282884128-0.4648252675424694141-
0.3716352965724093151-0.2830423366403393161-
0.18678638366525771360908923756642156990.191575370678
730670.288154332645214450.37592827254015150.4602142024
060380.5532161613550040.63703892370020.72595491390010.
81940411180000.98401751000111045000
Sheet2
Sheet3
Effect of Support-based Pruning
Support-based pruning eliminates mostly negatively correlated
itemsets
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Chart113951722823595036487168308679089 158817596025323
702591948411
Correlation
All Itempairs
Chart20047972535428692755428141635000
Correlation
Support > 0.9
Chart372970951501822672963363833753703322722021618949
41174
Correlation
Support > 0.7
Chart413921642653274285425726406656646786455404063552
37139118515
Correlation
Support > 0.5
Chart510212218282846403325217210000000
Correlation
Support < 0.01
Chart61371686667849493937756301453000000
Correlation
Support < 0.03
Chart71391104106107128141151161136996745158421000
Correlation
Support < 0.05
Sheet1CorrelationAll> 0.9> 0.7> 0.5< 0.01< 0.03< 0.05-
1130713101313-0.99502992217191-0.81724701642268104-
0.72827952651866106-0.635991503272867107-
0.550371824282884128-0.4648252675424694141-
0.3716352965724093151-0.2830423366403393161-
0.18678638366525771360908923756642156990.191575370678
730670.288154332645214450.37592827254015150.4602142024
060380.5532161613550040.63703892370020.72595491390010.
81940411180000.98401751000111045000
Sheet2
Sheet3
Chart113951722823595036487168308679089158817596025323
702591948411
Correlation
All Itempairs
Chart20047972535428692755428141635000
Correlation
Support > 0.9
Chart372970951501822672963363833753703322722021618949
41174
Correlation
Support > 0.7
Chart413921642653274285425726406656646786455404063552
37139118515
Correlation
Support > 0.5
Chart510212218282846403325217210000000
Correlation
Support < 0.01
Chart61371686667849493937756301453000000
Correlation
Support < 0.03
Chart71391104106107128141151161136996745158421000
Correlation
Support < 0.05
Sheet1CorrelationAll> 0.9> 0.7> 0.5< 0.01< 0.03< 0.05-
1130713101313-0.99502992217191-0.81724701642268104-
0.72827952651866106-0.635991503272867107-
0.550371824282884128-0.4648252675424694141-
0.3716352965724093151-0.2830423366403393161-
0.18678638366525771360908923756642156990.191575370678
730670.288154332645214450.37592827254015150.4602142024
060380.5532161613550040.63703892370020.72595491390010.
81940411180000.98401751000111045000
Sheet2
Sheet3
Chart113951722823595036487168308679089158817596025323
702591948411
Correlation
All Itempairs
Chart20047972535428692755428141635000
Correlation
Support > 0.9
Chart372970951501822672963363833753703322722021618949
41174
Correlation
Support > 0.7
Chart413921642653274285425726406656646786455404063552
37139118515
Correlation
Support > 0.5
Chart510212218282846403325217210000000
Correlation
Support < 0.01
Chart61371686667849493937756301453000000
Correlation
Support < 0.03
Chart71391104106107128141151161136996745158421000
Correlation
Support < 0.05
Sheet1CorrelationAll> 0.9> 0.7> 0.5< 0.01< 0.03< 0.05-
1130713101313-0.99502992217191-0.81724701642268104-
0.72827952651866106-0.635991503272867107-
0.550371824282884128-0.4648252675424694141-
0.3716352965724093151-0.2830423366403393161-
0.18678638366525771360908923756642156990.191575370678
730670.288154332645214450.37592827254015150.4602142024
060380.5532161613550040.63703892370020.72595491390010.
81940411180000.98401751000111045000
Sheet2
Sheet3
Effect of Support-based PruningInvestigate how support-based
pruning affects other measures
Steps:Generate 10000 contingency tablesRank each table
according to the different measuresCompute the pair-wise
correlation between the measures
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Effect of Support-based Pruning Without Support Pruning (All
Pairs) Red cells indicate correlation between
the pair of measures > 0.85 40.14% pairs have correlation >
0.85
Scatter Plot between Correlation & Jaccard Measure
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Effect of Support-
pairs have correlation > 0.85
Scatter Plot between Correlation & Jaccard Measure:
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Effect of Support-
pairs have correlation > 0.85
Scatter Plot between Correlation & Jaccard Measure
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Subjective Interestingness MeasureObjective measure: Rank
patterns based on statistics computed from datae.g., 21
measures of association (support, confidence, Laplace, Gini,
mutual information, Jaccard, etc).
Subjective measure:Rank patterns according to user’s
interpretation A pattern is subjectively interesting if it
contradicts the
expectation of a user (Silberschatz & Tuzhilin) A pattern is
subjectively interesting if it is actionable
(Silberschatz & Tuzhilin)
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Interestingness via UnexpectednessNeed to model expectation
of users (domain knowledge)
Need to combine expectation of users with evidence from data
(i.e., extracted patterns)
+
Pattern expected to be frequent
-
Pattern expected to be infrequent
Pattern found to be frequent
Pattern found to be infrequent
+
-
Expected Patterns
-
+
Unexpected Patterns
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
Interestingness via UnexpectednessWeb Data (Cooley et al
2001)Domain knowledge in the form of site structureGiven an
itemset F = {X1, X2, …, Xk} (Xi : Web pages) L: number of
links connecting the pages lfactor = L / -1) cfactor = 1
(if graph is connected), 0 (disconnected graph)Structure
Usage evidence
Use Dempster-Shafer theory to combine domain knowledge and
evidence from data
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR
2002
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Beer
}
Diaper
,
Milk
{
Þ
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDE
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions
List of
Candidates
M
w
)
(
)
(
)
(
:
,
Y
s
X
s
Y
X
Y
X
³
Þ
Í
"
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDE
TIDA1A2A3A4A5A6A7A8A9A10B1B2B3B4B5B6B7B8B9B10
C1C2C3C4C5C6C7C8C9C10
1
1111111111
00000000000000000000
2
1111111111
00000000000000000000
3
1111111111
00000000000000000000
4
1111111111
00000000000000000000
5
1111111111
00000000000000000000
60000000000
1111111111
0000000000
70000000000
1111111111
0000000000
80000000000
1111111111
0000000000
90000000000
1111111111
0000000000
100000000000
1111111111
0000000000
1100000000000000000000
1111111111
1200000000000000000000
1111111111
1300000000000000000000
1111111111
1400000000000000000000
1111111111
1500000000000000000000
1111111111
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions
Hash Structure
k
Buckets
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(
=
=
=
s
s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(
=
=
=
s
s
c
å
=
÷
ø
ö
ç
è
æ
´
=
10
1
10
3
k
k
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCD
E
1
2
3
1
1
1
1
+
-
=
ú
û
ù
ê
ë
é
÷
ø
ö
ç
è
æ
-
´
÷
ø
ö
ç
è
æ
=
+
-
=
-
=
å
å
d
d
d
k
k
d
j
j
k
d
k
d
R
1 2 3 5 6
Transaction, t
2 3 5 613 5 62
5 61 33 5 61 261 55 62 362 5
5 63
1 2 3
1 2 5
1 2 6
1 3 5
1 3 6
1 5 6
2 3 5
2 3 6
2 5 63 5 6
Subsets of 3 items
Level 1
Level 2
Level 3
63 5
TIDItems
1{A,B}
2{B,C,D}
3{A,B,C,D}
4{A,B,D}
5{A,B,C,D}
Item
Count
Bread
4
Coke
2
Milk
4
Beer
3
Diaper
4
Eggs
1
Itemset
Count
{
Bread,Milk}
3
{
Bread,Beer}
2
{
Bread,Diaper}
3
{
Milk,Beer}
2
{
Milk,Diaper}
3
{
Beer,Diaper}
3
Itemset
Count
{Bread,Milk,Diaper}
3
ItemsetSupport
{A}4
{B}5
{C}3
{D}4
{A,B}4
{A,C}2
{A,D}3
{B,C}3
{B,D}4
{C,D}3
ItemsetSupport
{A,B,C}2
{A,B,D}3
{A,C,D}2
{B,C,D}3
{A,B,C,D}2
TIDItems
1ABC
2ABCD
3BCE
4ACDE
5DE
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
124
123
1234
245
345
12
124
24
4
123
2
3
24
34
45
12
2
24
4
4
2
3
4
2
4
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
124
123
1234
245
345
12
124
24
4
123
2
3
24
34
45
12
2
24
4
4
2
3
4
2
4
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Frequent
itemset
border
null
{a
1
,a
2
,...,a
n
}
(a) General-to-specific
null
{a
1
,a
2
,...,a
n
}
Frequent
itemset
border
(b) Specific-to-general
..
..
..
..
Frequent
itemset
border
null
{a
1
,a
2
,...,a
n
}
(c) Bidirectional
..
..
null
ABACADBCBD
CD
AB
C
D
ABCABDACD
BCD
ABCD
null
ABAC
ADBCBDCD
AB
C
D
ABC
ABDACDBCD
ABCD
(a) Prefix tree(b) Suffix tree
(a) Breadth first(b) Depth first
TIDItems
1A,B,E
2B,C,D
3C,E
4A,C,D
5A,B,C,D
6A,E
7A,B
8A,B,C
9A,C,D
10B
Horizontal
Data Layout
ABCDE
11221
42343
55456
6789
789
810
9
Vertical Data Layout
TIDItems
1{A,B}
2{B,C,D}
3{A,C,D,E}
4{A,D,E}
5{A,B,C}
6{A,B,C,D}
7{B,C}
8{A,B,C}
9{A,B,D}
10{B,C,E}
ItemPointer
A
B
C
D
E
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
TIDItems
1{B}
2{}
3{C,D,E}
4{D,E}
5{B,C}
6{B,C,D}
7{}
8{B,C}
9{B,D}
10{}
TID
Items
1
A,B,E
2
B,C,D
3
C,E
4
A,C,D
5
A,B,C,D
6
A,E
7
A,B
8
A,B,C
9
A,C,D
10
B
Horizontal
Data Layout
A
B
C
D
E
1
1
2
2
1
4
2
3
4
3
5
5
4
5
6
6
7
8
9
7
8
9
8
10
9
Vertical Data Layout
A
1
4
5
6
7
8
9
B
1
2
5
7
8
10
AB
1
5
7
8
ABCD=>{ }
BCD=>AACD=>BABD=>CABC=>D
BC=>ADBD=>ACCD=>ABAD=>BCAC=>BDAB=>CD
D=>ABCC=>ABDB=>ACDA=>BCD
ABCD=>{ }
BCD=>AACD=>BABD=>CABC=>D
BC=>ADBD=>ACCD=>ABAD=>BCAC=>BDAB=>CD
D=>ABCC=>ABDB=>ACDA=>BCD
BD=>
AC
CD=>
AB
D=>
ABC
A
Item
MS(I)
Sup(I)
A
0.10%
0.25%
B
0.20%
0.26%
C
0.30%
0.29%
D
0.50%
0.05%
E
3%
4.20%
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
Item
MS(I)
Sup(I)
A
0.10%
0.25%
B
0.20%
0.26%
C
0.30%
0.29%
D
0.50%
0.05%
E
3%
4.20%
Featur
e
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Selection
Preprocessing
Mining
Postprocessing
Data
Selected
Data
Preprocessed
Data
Patterns
Knowledge
)]
(
1
)[
(
)]
(
1
)[
(
)
(
)
(
)
,
(
)
(
)
(
)
,
(
)
(
)
(
)
,
(
)
(
)
|
(
Y
P
Y
P
X
P
X
P
Y
P
X
P
Y
X
P
t
coefficien
Y
P
X
P
Y
X
P
PS
Y
P
X
P
Y
X
P
Interest
Y
P
X
Y
P
Lift
-
-
-
=
-
-
=
=
=
f
10
)
1
.
0
)(
1
.
0
(
1
.
0
=
=
Lift
11
.
1
)
9
.
0
)(
9
.
0
(
9
.
0
=
=
Lift
Example
f
11
f
10
f
01
f
00
E18123834241370
E2833026221046
E3948194127298
E43954308052961
E52886136313204431
E6150020005006000
E74000200010003000
E84000200020002000
E91720712151154
E1061248347452
B
B
A
p
q
A
r
s
A
A
B
p
r
B
q
s
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
1
1
1
1
0
1
1
1
1
1
A
B
C
D
(a)
(b)
0
1
1
1
1
1
1
1
1
0
0
0
0
0
1
0
0
0
0
0
(c)
E
F
5238
.
0
3
.
0
7
.
0
3
.
0
7
.
0
7
.
0
7
.
0
6
.
0
=
´
´
´
´
-
=
f
5238
.
0
3
.
0
7
.
0
3
.
0
7
.
0
3
.
0
3
.
0
2
.
0
=
´
´
´
´
-
=
f
B
B
A
p
q
A
r
s
B
B
A
p
q
A
r
s + k
SymbolMeasureRangeP1P2P3O1O2O3O3'O4
Correlation-1 … 0 … 1YesYesYesYesNoYesYesNo
Lambda0 … 1YesNoNoYesNoNo*YesNo
Q
Yule's Q-1 … 0 … 1YesYesYesYesYesYesYesNo
Y
Yule's Y-1 … 0 … 1YesYesYesYesYesYesYesNo
Cohen's-1 … 0 … 1YesYesYesYesNoNoYesNo
M
Mutual Information0 … 1YesYesYesYesNoNo*YesNo
J
J-Measure0 … 1YesNoNoNoNoNoNoNo
G
Gini Index0 … 1YesNoNoNoNoNo*YesNo
s
Support0 … 1NoYesNoYesNoNoNoNo
c
Confidence0 … 1NoYesNoYesNoNoNoYes
L
Laplace0 … 1NoYesNoYesNoNoNoNo
V
oYes**NoNoYesNo
I
IS
IS (cosine)0 .. 1NoYesYesYesNoNoNoYes
PS
Piatetsky-Shapiro's-0.25 … 0 …
0.25YesYesYesYesNoYesYesNo
F
Certainty factor-1 … 0 … 1YesYesYesNoNoNoYesNo
AV
Added value0.5 … 1 … 1YesYesYesNoNoNoNoNo
S
Jaccard0 .. 1NoYesYesYesNoNoNoYes
K
Klosgen'sYesYesYesNoNoNoNoNo
33
2
0
3
1
321
3
2
All Itempairs
0
100
200
300
400
500
600
700
800
900
1000
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.01
0
50
100
150
200
250
300
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.03
0
50
100
150
200
250
300
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.05
0
50
100
150
200
250
300
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
All Pairs (40.14%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Conviction
Odds ratio
Col Strength
Correlation
Interest
PS
CF
Yule Y
Reliability
Kappa
Klosgen
Yule Q
Confidence
Laplace
IS
Support
Jaccard
Lambda
Gini
J-measure
Mutual Info
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Jaccard
0.005 <= support <= 0.500 (61.45%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Interest
Conviction
Odds ratio
Col Strength
Laplace
Confidence
Correlation
Klosgen
Reliability
PS
Yule Q
CF
Yule Y
Kappa
IS
Jaccard
Support
Lambda
Gini
J-measure
Mutual Info
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Jaccard
0.005 <= support <= 0.300 (76.42%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Support
Interest
Reliability
Conviction
Yule Q
Odds ratio
Confidence
CF
Yule Y
Kappa
Correlation
Col Strength
IS
Jaccard
Laplace
PS
Klosgen
Lambda
Mutual Info
Gini
J-measure
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Jaccard
)
...
(
)
...
(
2
1
2
1
k
k
X
X
X
P
X
X
X
P
È
È
È
=
I
I
I
Data Mining
Association Rules: Advanced Concepts and Algorithms
Lecture Notes for Chapter 7
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Continuous and Categorical Attributes
Example of Association Rule:
= No}
How to apply association analysis formulation to non-
asymmetric binary variables?
Session Id
Country
Session Length (sec)
Number of Web Pages viewed
Gender
Browser Type
Buy
1
USA
982
8
Male
IE
No
2
China811
10
Female
Netscape
No
3
USA
2125
45
Female
Mozilla
Yes
4
Germany
596
4
Male
IE
Yes
5
Australia
123
9
Male
Mozilla
No
…
…
…
…
…
…
…
10
Handling Categorical AttributesTransform categorical attri bute
into asymmetric binary variables
Introduce a new “item” for each distinct attribute-value
pairExample: replace Browser Type attribute with Browser Type
= Internet Explorer Browser Type = Mozilla Browser Type =
Mozilla
Handling Categorical AttributesPotential IssuesWhat if attribute
has many possible values Example: attribute country has more
than 200 possible values Many of the attribute values may have
very low support
Potential solution: Aggregate the low-support attribute values
What if distribution of attribute values is highly skewed
Example: 95% of the visitors have Buy = No Most of the items
will be associated with (Buy=No) item
Potential solution: drop the highly frequent items
Handling Continuous AttributesDifferent kinds of
Different methods:Discretization-basedStatistics-basedNon-
discretization based minApriori
Handling Continuous AttributesUse
discretizationUnsupervised:Equal-width binningEqual-depth
binningClustering
Supervised:
bin1
bin3
bin2
Attribute values,
vClassv1v2v3v4v5v6v7v8v9Anomalous002010200000Normal15
0100000100100150100
Discretization IssuesSize of the discretized intervals affect
support & confidence
If intervals too small may not have enough supportIf intervals
too large may not have enough confidencePotential solution: use
all possible intervals
Discretization IssuesExecution timeIf intervals contain n values,
there are on average O(n2) possible ranges
Too many rules
{Refund =
Approach by Srikant & AgrawalPreprocess the dataDiscretize
attribute using equi-depth partitioning Use partial completeness
measure to determine number of partitions Merge adjacent
intervals as long as support is less than max-support
Apply existing association rule mining algorithms
Determine interesting rules in the output
Approach by Srikant & AgrawalDiscretization will lose
information
Use partial completeness measure to determine how much
information is lost
C: frequent itemsets obtained by considering all
ranges of attribute values
P: frequent itemsets obtained by considering all ranges
over the partitions
P is K-
such that:
1. X’ is a generaliz
support(Y)
Given K (partial completeness level), can determine
number of intervals (N)
X
Approximated X
Interestingness Measure
Given an itemset: Z = {z1, z2, …, zk} and its generalization Z’
= {z1’, z2’, …, zk’}
P(Z): support of Z
EZ’(Z): expected support of Z based on Z’
Z is R-
{Refund = N
ES’(Y|X): expected support of Z based on Z’
Rule S is R-interesting w.r.t its ancestor rule S’ if Support, P(S)
Statistics-based MethodsExample:
consequent consists of a continuous variable, characterized by
their statistics mean, median, standard deviation,
etc.Approach:Withhold the target variable from the rest of the
dataApply existing frequent itemset generation on the rest of the
dataFor each frequent itemset, compute the descriptive statistics
for the corresponding target variable Frequent itemset becomes
a rule by introducing the target variable as rule
consequentApply statistical test to determine interestingness of
the rule
Statistics-based MethodsHow to determine whether an
association rule interesting?Compare the statistics for segment
of population covered by the rule vs segment of population not
covered by the rule:
variance 1 under null hypothesis
Statistics-based MethodsExample:
years (i n1 = 50, s1 = 3.5For r’
(complement): n2 = 250, s2 = 6.5
For 1-sided test at 95% confidence level, critical Z-value for
rejecting null hypothesis is 1.64.Since Z is greater than 1.64, r
is an interesting rule
Min-Apriori (Han et al)
Example:
W1 and W2 tends to appear together in the same document
Document-term matrix:
Sheet1TIDW1W2W3W4W5TIDW1W2W3W4W5D122001D10.4
0.40.00.00.2D200122D20.00.00.20.40.4D323000D30.40.60.00.0
0.0D400101D40.00.00.50.00.5D511102D50.20.20.20.00.4
Sheet2
Sheet3
Min-AprioriData contains only continuous attributes of the
same “type”e.g., frequency of words in a document
Potential solution:Convert into 0/1 matrix and then apply
existing algorithms lose word frequency
informationDiscretization does not apply as users want
association among words not ranges of words
Sheet1TIDW1W2W3W4W5TIDW1W2W3W4W5D122001D10.4
0.40.00.00.2D200122D20.00.00.20.40.4D323000D30.40.60.00.0
0.0D400101D40.00.00.50.00.5D511102D50.20.20.20.00.4
Sheet2
Sheet3
Min-AprioriHow to determine the support of a word?If we
simply sum up its frequency, support count will be greater than
total number of documents! Normalize the word vectors – e.g.,
using L1 norm Each word has a support equals to 1.0
Normalize
Sheet1TIDW1W2W3W4W5TIDW1W2W3W4W5D122001D10.4
0.40.00.00.2D200122D20.00.00.20.40.4D323000D30.40.60.00.0
0.0D400101D40.00.00.50.00.5D511102D50.20.20.20.00.4
Sheet2
Sheet3
Sheet1TIDW1W2W3W4W5TIDW1W2W3W4W5D122001D10.4
00.330.000.000.17D200122D20.000.000.331.000.33D323000D3
0.400.500.000.000.00D400101D40.000.000.330.000.17D511102
D50.200.170.330.000.33
Sheet2
Sheet3
Min-AprioriNew definition of support:
Example:
Sup(W1,W2,W3)
= 0 + 0 + 0 + 0 + 0.17
= 0.17
Sheet1TIDW1W2W3W4W5TIDW1W2W3W4W5D122001D10.4
00.330.000.000.17D200122D20.000.000.331.000.33D323000D3
0.400.500.000.000.00D400101D40.000.000.330.000.17D511102
D50.200.170.330.000.33
Sheet2
Sheet3
Anti-monotone property of Support
Example:
Sup(W1) = 0.4 + 0 + 0.4 + 0 + 0.2 = 1
Sup(W1, W2) = 0.33 + 0 + 0.4 + 0 + 0.17 = 0.9
Sup(W1, W2, W3) = 0 + 0 + 0 + 0 + 0.17 = 0.17
Sheet1TIDW1W2W3W4W5TIDW1W2W3W4W5D122001D10.4
00.330.000.000.17D200122D20.000.000.331.000.33D323000D3
0.400.500.000.000.00D400101D40.000.000.330.000.17D511102
D50.200.170.330.000.33
Sheet2
Sheet3
Multi-level Association Rules
Multi-level Association RulesWhy should we incorporate
concept hierarchy?Rules at lower levels may not have enough
support to appear in any frequent itemsets
Rules at lower levels of the hierarchy are overly specific e.g.,
are indicative of association between milk and bread
Multi-level Association RulesHow do support and confidence
vary as we traverse the concept hierarchy?If X is the parent
item for both X1 and X2, then
If
and X is parent of X1, Y is parent of Y1
then
If
then
Multi-level Association RulesApproach 1:Extend current
association rule formulation by augmenting each transaction
with higher level items
Original Transaction: {skim milk, wheat bread}
Augmented Transaction:
{skim milk, wheat bread, milk, bread, food}
Issues:Items that reside at higher levels have much higher
support counts if support threshold is low, too many frequent
patterns involving items from the higher levelsIncreased
dimensionality of the data
Multi-level Association RulesApproach 2:Generate frequent
patterns at highest level first
Then, generate frequent patterns at the next highest level, and
so on
Issues:I/O requirements will increase dramatically because we
need to perform more passes over the dataMay miss some
potentially interesting cross-level association patterns
Sequence Data
Sequence Database:
10�
15�
20�
25�
30�
35�
2�
3�
5�
6�
1�
�
1�
Timeline�
Object A:�
Object B:�
Object C:�
�
4�
5�
6�
�
2�
7�
8�
1�
�
2�
�
1�
6�
1�
7�
8�
Examples of Sequence Data
Sequence
E1
E2
E1
E3
E2
E3
E4
E2
Element (Transaction)
Event
(Item)Sequence DatabaseSequenceElement (Transaction)Event
(Item)CustomerPurchase history of a given customerA set of
items bought by a customer at time tBooks, diary products,
CDs, etcWeb DataBrowsing activity of a particular Web
visitorA collection of files viewed by a Web visitor after a
single mouse clickHome page, index page, contact info,
etcEvent dataHistory of events generated by a given
sensorEvents triggered by a sensor at time tTypes of alarms
generated by sensors Genome sequencesDNA sequence of a
particular speciesAn element of the DNA sequence Bases
A,T,G,C
Formal Definition of a SequenceA sequence is an ordered list of
elements (transactions)
s = < e1 e2 e3 … >
Each element contains a collection of events (items)
ei = {i1, i2, …, ik}
Each element is attributed to a specific time or location
Length of a sequence, |s|, is given by the number of elements of
the sequence
A k-sequence is a sequence that contains k events (items)
Examples of SequenceWeb sequence:
< {Homepage} {Electronics} {Digital Cameras} {Canon
Digital Camera} {Shopping Cart} {Order Confirmation}
{Return to Shopping} >
Sequence of initiating events causing the nuclear accident at 3-
mile Island:
(http://stellar-
one.com/nuclear/staff_reports/summary_SOE_the_initiating_eve
nt.htm)
< {clogged resin} {outlet valve closure} {loss of feedwater}
{condenser polisher outlet valve shut} {booster pumps trip}
{main waterpump trips} {main turbine trips} {reactor pressure
increases}>
Sequence of books checked out at a library:
<{Fellowship of the Ring} {The Two Towers} {Return of the
King}>
Formal Definition of a SubsequenceA sequence <a1 a2 … an> is
contained in another sequence <b1 b2 … bm> (m ≥ n) if there
exist integers
i1 < i2 < … < in s
The support of a subsequence w is defined as the fraction of
data sequences that contain wA sequential pattern is a frequent
subsequence (i.e., a subsequence whose support is ≥
minsup)Data sequenceSubsequenceContain?< {2,4} {3,5,6} {8}
>< {2} {3,5} >Yes< {1,2} {3,4} > < {1} {2} >No< {2,4} {2,4}
{2,5} >< {2} {4} >Yes
Sequential Pattern Mining: DefinitionGiven: a database of
sequences a user-specified minimum support threshold, minsup
Task:Find all subsequences with support ≥ minsup
Sequential Pattern Mining: ChallengeGiven a sequence: <{a b}
{c d e} {f} {g h i}>Examples of subsequences:
<{a} {c d} {f} {g} >, < {c d e} >, < {b} {g} >, etc.
How many k-subsequences can be extracted from a given n-
sequence?
<{a b} {c d e} {f} {g h i}> n = 9
k=4: Y _ _ Y Y _ _ _ Y
<{a} {d e} {i}>
Sequential Pattern Mining: Example
Minsup = 50%
Examples of Frequent Subsequences:
< {1,2} > s=60%
< {2,3} > s=60%
< {2,4}> s=80%
< {3} {5}> s=80%
< {1} {2} > s=80%
< {2} {2} > s=60%
< {1} {2,3} > s=60%
< {2} {2,3} > s=60%
< {1,2} {2,3} > s=60%
Extracting Sequential PatternsGiven n events: i1, i2, i3, …, in
Candidate 1-subsequences:
<{i1}>, <{i2}>, <{i3}>, …, <{in}>
Candidate 2-subsequences:
<{i1, i2}>, <{i1, i3}>, …, <{i1} {i1}>, <{i1} {i2}>, …, <{in-
1} {in}>
Candidate 3-subsequences:
<{i1, i2 , i3}>, <{i1, i2 , i4}>, …, <{i1, i2} {i1}>, <{i1, i2}
{i2}>, …,
<{i1} {i1 , i2}>, <{i1} {i1 , i3}>, …, <{i1} {i1} {i1}>, <{i1}
{i1} {i2}>, …
Generalized Sequential Pattern (GSP)Step 1: Make the first pass
over the sequence database D to yield all the 1-element frequent
sequences
Step 2:
Repeat until no new frequent sequences are
foundCandidate Generation: Merge pairs of frequent
subsequences found in the (k-1)th pass to generate candidate
sequences that contain k items
Candidate Pruning:Prune candidate k-sequences that contain
infrequent (k-1)-subsequences
Support Counting:Make a new pass over the sequence database
D to find the support for these candidate sequences
Candidate Elimination:Eliminate candidate k-sequences whose
actual support is less than minsup
Candidate GenerationBase case (k=2): Merging two frequent 1-
sequences <{i1}> and <{i2}> will produce two candidate 2-
sequences: <{i1} {i2}> and <{i1 i2}>
General case (k>2):A frequent (k-1)-sequence w1 is merged
with another frequent
(k-1)-sequence w2 to produce a candidate k-sequence if the
subsequence obtained by removing the first event in w 1 is the
same as the subsequence obtained by removing the last event in
w2 The resulting candidate after merging is given by the
sequence w1 extended with the last event of w2.
If the last two events in w2 belong to the same element, then the
last event in w2 becomes part of the last element in w1
Otherwise, the last event in w2 becomes a separate element
appended to the end of w1
Candidate Generation ExamplesMerging the sequences
w1=<{1} {2 3} {4}> and w2 =<{2 3} {4 5}>
will produce the candidate sequence < {1} {2 3} {4 5}> because
the last two events in w2 (4 and 5) belong to the same element
Merging the sequences
w1=<{1} {2 3} {4}> and w2 =<{2 3} {4} {5}>
will produce the candidate sequence < {1} {2 3} {4} {5}>
because the last two events in w2 (4 and 5) do not belong to the
same element
We do not have to merge the sequences
w1 =<{1} {2 6} {4}> and w2 =<{1} {2} {4 5}>
to produce the candidate < {1} {2 6} {4 5}> because if the
latter is a viable candidate, then it can be obtained by merging
w1 with
< {1} {2 6} {5}>
GSP Example
< {1} {2} {3} >
< {1} {2 5} >�< {1} {5} {3} >
< {2} {3} {4} >
< {2 5} {3} >
< {3} {4} {5} >
< {5} {3 4} >�
< {1} {2} {3} {4} >
< {1} {2 5} {3} >
< {1} {5} {3 4} >�< {2} {3} {4} {5} >�< {2 5} {3 4} >�
< {1} {2 5} {3} >�
Frequent �3-sequences�
Candidate�Generation�
Candidate�Pruning�
Timing Constraints (I)
{A B} {C} {D E}
<= ms
<= xg
>ng
xg: max-gap
ng: min-gap
ms: maximum span
xg = 2, ng = 0, ms= 4Data sequenceSubsequenceContain?<
{2,4} {3,5,6} {4,7} {4,5} {8} >< {6} {5} >Yes< {1} {2} {3}
{4} {5}>< {1} {4} >No< {1} {2,3} {3,4} {4,5}>< {2} {3} {5}
> Yes< {1,2} {3} {2,3} {3,4} {2,4} {4,5}>< {1,2} {5} >No
Mining Sequential Patterns with Timing ConstraintsApproach
1:Mine sequential patterns without timing
constraintsPostprocess the discovered patterns
Approach 2:Modify GSP to directly prune candidates that
violate timing constraintsQuestion: Does Apriori principle still
hold?
Apriori Principle for Sequence Data
Suppose:
xg = 1 (max-gap)
ng = 0 (min-gap)
ms = 5 (maximum span)
minsup = 60%
<{2} {5}> support = 40%
but
<{2} {3} {5}> support = 60%
Problem exists because of max-gap constraint
No such problem if max-gap is infinite
Contiguous Subsequencess is a contiguous subsequence of
w = <e1>< e2>…< ek>
if any of the following conditions hold:
s is obtained from w by deleting an item from either e1 or ek
s is obtained from w by deleting an item from any element ei
that contains more than 2 items
s is a contiguous subsequence of s’ and s’ is a contiguous
subsequence of w (recursive definition)
Examples: s = < {1} {2} > is a contiguous subsequence of
< {1} {2 3}>, < {1 2} {2} {3}>, and < {3 4} {1 2} {2 3}
{4} > is not a contiguous subsequence of
< {1} {3} {2}> and < {2} {1} {3} {2}>
Modified Candidate Pruning StepWithout maxgap constraint:A
candidate k-sequence is pruned if at least one of its (k-1)-
subsequences is infrequent
With maxgap constraint:A candidate k-sequence is pruned if at
least one of its contiguous (k-1)-subsequences is infrequent
Timing Constraints (II)
xg: max-gap
ng: min-gap
ws: window size
ms: maximum span
xg = 2, ng = 0, ws = 1, ms= 5Data
sequenceSubsequenceContain?< {2,4} {3,5,6} {4,7} {4,6} {8}
>< {3} {5} >No< {1} {2} {3} {4} {5}>< {1,2} {3} > Yes<
{1,2} {2,3} {3,4} {4,5}>< {1,2} {3,4} >Yes
Modified Support Counting StepGiven a candidate pattern: <{a,
c}>Any data sequences that contain
<… {a c} … >,
<… {a} … {c}…> ( where time({c}) – time({a}) ≤ ws)
<…{c} … {a} …> (where time({a}) – time({c}) ≤ ws)
will contribute to the support count of candidate pattern
Other FormulationIn some domains, we may have only one very
long time seriesExample: monitoring network traffic events for
attacks monitoring telecommunication alarm signalsGoal is to
find frequent sequences of events in the time seriesThis problem
is also known as frequent episode mining
E1
E2
E1
E2
E1
E2
E3
E4
E3 E4
E1
E2
E2 E4
E3 E5
E2
E3 E5
E1
E2
E3 E1
Pattern: <E1> <E3>
General Support Counting Schemes
Assume:
xg = 2 (max-gap)
ng = 0 (min-gap)
ws = 0 (window size)
ms = 2 (maximum span)
2�
3�
4�
5�
p�
q�
p�
6�
7�
q�
q�
Sequence: (p) (q)�
Method Support� Count�
COBJ�
1�
p�
1�
q�
q�
p�
p�
CWIN�
6�
Object's Timeline�
CMINWIN�
4�
CDIST_O�
8�
CDIST�
5�
Frequent Subgraph MiningExtend association rule mining to
finding frequent subgraphsUseful for Web Mining,
computational chemistry, bioinformatics, spatial data sets, etc
Graph Definitions
Representing Transactions as GraphsEach transaction is a clique
of items
Sheet1Transaction
IdItems1{A,B,C,D}2{A,B,E}3{B,C}4{A,B,D,E}5{B,C,D}
Sheet2
Sheet3
Representing Graphs as Transactions
ChallengesNode may contain duplicate labelsSupport and
confidenceHow to define them?Additional constraints imposed
by pattern structureSupport and confidence are not the only
constraintsAssumption: frequent subgraphs must be
connectedApriori-like approach: Use frequent k-subgraphs to
generate frequent (k+1) subgraphsWhat is k?
Challenges…Support: number of graphs that contain a particular
subgraph
Apriori principle still holds
Level-wise (Apriori-like) approach:Vertex growing: k is the
number of verticesEdge growing: k is the number of edges
Vertex Growing
a�
a�
e�
a�
p�
p�
q�
r�
p�
a�
a�
r�
a�
a�
a�
d�
p�
r�
a�
p�
q�
r�
e�
p�
G1�
G2�
G3 = join(G1,G2)�
d�
r�
+�
Edge Growing
a�
a�
f�
a�
p�
+�
p�
q�
r�
p�
a�
a�
r�
a�
a�
a�
f�
p�
r�
a�
p�
q�
r�
f�
p�
G1�
G2�
G3 = join(G1,G2)�
r�
Apriori-like AlgorithmFind frequent 1-
subgraphsRepeatCandidate generation Use frequent (k-1)-
subgraphs to generate candidate k-subgraphCandidate pruning
Prune candidate subgraphs that contain infrequent
(k-1)-subgraphs Support counting Count the support of each
remaining candidateEliminate candidate k-subgraphs that are
infrequent
In practice, it is not as easy. There are many other issues
Example: Dataset
a�
b�
e�
c�
q�
p�
q�
d�
r�
p�
a�
b�
r�
e�
d�
c�
p�
a�
a�
p�
q�
r�
b�
p�
G1�
G2�
r�
G3�
d�
e�
r�
q�
c�
d�
p�
p�
p�
G4�
r�
Example
a�
q�
b�
a�
e�
p�
r�
b�
d�
b�
c�
d�
e�
k=1
Frequent�Subgraphs�
k=3�Candidate�Subgraphs�
a�
p�
c�
d�
p�
c�
e�
p�
a�
b�
d�
r�
p�
d�
c�
e�
p�
(Pruned candidate)�
Minimum support count = 2�
k=2
Frequent�Subgraphs�
Candidate GenerationIn Apriori:Merging two frequent k-
itemsets will produce a candidate (k+1)-itemset
In frequent subgraph mining (vertex/edge growing)Merging two
frequent k-subgraphs may produce more than one candidate
(k+1)-subgraph
Multiplicity of Candidates (Vertex Growing)
a�
a�
e�
a�
p�
p�
q�
r�
p�
a�
a�
r�
a�
a�
a�
d�
p�
r�
a�
p�
q�
r�
e�
p�
G1�
G2�
G3 = join(G1,G2)�
d�
r�
?�
+�
Multiplicity of Candidates (Edge growing)Case 1: identical
vertex labels
Multiplicity of Candidates (Edge growing)Case 2: Core contains
identical labels
Core: The (k-1) subgraph that is common
between the joint graphs
a�
a�
a�
a�
a�
a�
a�
a�
a�
a�
a�
a�
+�
c�
�
b�
�
a�
a�
a�
a�
b�
�
c�
�
c�
�
a�
a�
a�
a�
c�
�
b�
�
b�
�
Multiplicity of Candidates (Edge growing)Case 3: Core
multiplicity
Adjacency Matrix Representation The same graph can be
represented in many ways
Sheet1A(1)A(2)A(3)A(4)B(5)B(6)B(7)B(8)A(1)11101000A(2)1
1010100A(3)10110010A(4)01110001B(5)10001110B(6)0100110
1B(7)00101011B(8)00010111A(1)A(2)A(3)A(4)B(5)B(6)B(7)B(
8)A(1)11010100A(2)11100010A(3)01111000A(4)10110001B(5)
00101011B(6)10000111B(7)01001110B(8)00011101
Sheet2
Sheet3
Sheet1A(1)A(2)A(3)A(4)B(5)B(6)B(7)B(8)A(1)11101000A(2)1
1010100A(3)10110010A(4)01110001B(5)10001110B(6)0100110
1B(7)00101011B(8)00010111A(1)A(2)A(3)A(4)B(5)B(6)B(7)B(
8)A(1)11010100A(2)11100010A(3)01111000A(4)10110001B(5)
00101011B(6)10000111B(7)01001110B(8)00011101
Sheet2
Sheet3
Graph IsomorphismA graph is isomorphic if it is topologically
equivalent to another graph
A�
A�
A�
A�
B�
A�
B�
A�
B�
B�
A�
A�
B�
B�
B�
B�
Graph IsomorphismTest for graph isomorphism is
needed:During candidate generation step, to determine whether
a candidate has been generated
During candidate pruning step, to check whether its
(k-1)-subgraphs are frequent
During candidate counting, to check whether a candidate is
contained within another graph
Graph IsomorphismUse canonical labeling to handle
isomorphismMap each graph into an ordered string
representation (known as its code) such that two isomorphic
graphs will be mapped to the same canonical encodingExample:
Lexicographically largest adjacency matrix
String: 0010001111010110
Canonical: 0111101011001000
Session
Id
Country Session
Length
(sec)
Number of
Web Pages
viewed
Gender
Browser
Type
Buy
1 USA 982 8 Male IE No
2 China 811 10 Female Netscape No
3 USA 2125 45 Female Mozilla Yes
4 Germany 596 4 Male IE Yes
5 Australia 123 9 Male Mozilla No
… … … … … … …
10
)
'
(
)
'
(
)
(
)
'
(
)
(
)
'
(
)
(
)
(
2
2
1
1
'
Z
P
z
P
z
P
z
P
z
P
z
P
z
P
Z
E
k
k
Z
´
´
´
´
=
L
)
'
|
'
(
)
'
(
)
(
)
'
(
)
(
)
'
(
)
(
)
|
(
2
2
1
1
X
Y
P
y
P
y
P
y
P
y
P
y
P
y
P
X
Y
E
k
k
´
´
´
´
=
L
2
2
2
1
2
1
'
n
s
n
s
Z
+
D
-
-
=
m
m
11
.
3
250
5
.
6
50
5
.
3
5
23
30
'
2
2
2
2
2
1
2
1
=
+
-
-
=
+
D
-
-
=
n
s
n
s
Z
m
m
TIDW1W2W3W4W5
D122001
D200122
D323000
D400101
D511102
TIDW1W2W3W4W5
D122001
D200122
D323000
D400101
D511102
TIDW1W2W3W4W5
D10.400.330.000.000.17
D20.000.000.331.000.33
D30.400.500.000.000.00
D40.000.000.330.000.17
D50.200.170.330.000.33
å
Î
Î
=
T
i
C
j
j
i
D
C
)
,
(
)
sup(
min
TIDW1W2W3W4W5
D10.400.330.000.000.17
D20.000.000.331.000.33
D30.400.500.000.000.00
D40.000.000.330.000.17
D50.200.170.330.000.33
Food
Bread
Milk
Skim
2%
Electronics
Computers
Home
Desktop
Laptop
Wheat
White
Foremost
Kemps
DVD
TV
Printer
Scanner
Accessory
101520253035
2
3
5
6
1
1
Timeline
Object A:
Object B:
Object C:
4
5
6
27
8
1
2
1
6
1
7
8
126
4
9
:
Answer
=
÷
÷
ø
ö
ç
ç
è
æ
=
÷
÷
ø
ö
ç
ç
è
æ
k
n
Object
Timestamp
Events
A
1
1,2,4
A
2
2,3
A
3
5
B
1
1,2
B
2
2,3,4
C
1
1, 2
C
2
2,3,4
C
3
2,4,5
D
1
2
D
2
3, 4
D
3
4, 5
E
1
1, 3
E
2
2, 4, 5
< {1} {2} {3} >
< {1} {2 5} >
< {1} {5} {3} >
< {2} {3} {4} >
< {2 5} {3} >
< {3} {4} {5} >
< {5} {3 4} >
< {1} {2} {3} {4} >
< {1} {2 5} {3} >
< {1} {5} {3 4} >
< {2} {3} {4} {5} >
< {2 5} {3 4} >
< {1} {2 5} {3} >
Frequent
3-sequences
Candidate
Generation
Candidate
Pruning
p
Object's Timeline
Sequence: (p) (q)
Method Support
Count
COBJ
1
1
CWIN6
CMINWIN
4
pq
p
qq
p
qq
p
234567
CDIST_O8
CDIST5
Databases
Homepage
Research
Artificial
Intelligence
Data Mining
a
b
a
c
c
b
(a) Labeled Graph
p
q
p
p
r
s
t
r
t
q
p
a
a
c
b
(b) Subgraph
p
s
t
p
a
a
c
b
(c) Induced Subgraph
p
r
s
t
r
p
Transaction
Id
Items
1{A,B,C,D}
2{A,B,E}
3{B,C}
4{A,B,D,E}
5{B,C,D}
A
B
C
D
E
TID = 1:
a
b
e
c
p
q
r
p
a
b
d
p
r
G1
G2
q
e
c
a
p
q
r
b
p
G3
d
r
d
r
(a,b,p)
(a,b,q)
(a,b,r)
(b,c,p)
(b,c,q)
(b,c,r)
…
(d,e,r)
G1
1
0
0
0
0
1
…
0
G2
1
0
0
0
0
0
…
0
G3
0
0
1
1
0
0
…
0
G3
…
…
…
…
…
…
…
…
a
a
e
a
p
q
r
p
a
a
a
p
r
r
d
G1
G2
p
000
00
00
0
1
q
rp
rp
qpp
M
G
000
0
00
00
2
r
rrp
rp
pp
M
G
a
a
a
p
q
r
e
p
0000
0000
00
000
00
3
q
r
rrp
rp
qpp
M
G
G3 = join(G1,G2)
d
r
+
a
a
f
a
p
q
r
p
a
a
a
p
r
r
f
G1
G2
p
a
a
a
p
q
r
f
p
G3 = join(G1,G2)
r
+
a
b
e
c
p
q
r
p
a
b
d
p
r
G1G2
q
e
c
a
p
q
r
b
p
G3
d
r
d
r
(a,b,p)(a,b,q)(a,b,r)(b,c,p)(b,c,q)(b,c,r)…(d,e,r)
G1100001…0
G2100000…0
G3001100…0
G4000000…0
ae
q
c
d
p
p
p
G4
r
p
abcde
k=1
Frequent
Subgraphs
ab
p
cd
p
ce
q
ae
r
bd
p
ab
d
r
p
dc
e
p
(Pruned candidate)
Minimum support count = 2
k=2
Frequent
Subgraphs
k=3
Candidate
Subgraphs
a
a
e
a
p
q
r
p
a
a
a
p
r
r
d
G1
G2
p
000
00
00
0
1
q
rp
rp
qpp
M
G
000
0
00
00
2
r
rrp
rp
pp
M
G
a
a
a
p
q
r
e
p
0?00
?000
00
000
00
3
q
r
rrp
rp
qpp
M
G
G3 = join(G1,G2)
d
r
?
+
a
b
e
c
a
b
e
c
+
a
b
e
c
e
a
b
e
c
+
a
a
a
a
c
b
a
a
a
a
c
a
a
a
a
c
b
b
a
a
a
a
b
a
a
a
a
c
a
a
b
+
a
a
a
a
b
a
a
b
a
a
a
b
a
a
a
b
a
b
a
a
b
a
a
A(1)
A(2)
B (6)
A(4)
B (5)
A(3)
B (7)
B (8)
A(1)A(2)A(3)A(4)B(5)B(6)B(7)B(8)
A(1)
11101000
A(2)
11010100
A(3)
10110010
A(4)
01110001
B(5)
10001110
B(6)
01001101
B(7)
00101011
B(8)
00010111
A(2)
A(1)
B (6)
A(4)
B (7)
A(3)
B (5)
B (8)
A(1)A(2)A(3)A(4)B(5)B(6)B(7)B(8)
A(1)
11010100
A(2)
11100010
A(3)
01111000
A(4)
10110001
B(5)
00101011
B(6)
10000111
B(7)
01001110
B(8)
00011101
A
A
AA
BA
B
A
B
B
A
A
BB
B
B
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
0
1
1
0
1
0
1
1
1
1
0
0
0
1
0
0
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
0
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0

More Related Content

Similar to Data Mining Association Analysis Basic Concepts a

Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...ijdpsjournal
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdfWailaBaba
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptraju980973
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxlahiruherath654
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmShivarkarSandip
 

Similar to Data Mining Association Analysis Basic Concepts a (20)

Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
6asso
6asso6asso
6asso
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
B0950814
B0950814B0950814
B0950814
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
 

More from OllieShoresna

this assignment is about Mesopotamia and Egypt. Some of these cu.docx
this assignment is about Mesopotamia and Egypt. Some of these cu.docxthis assignment is about Mesopotamia and Egypt. Some of these cu.docx
this assignment is about Mesopotamia and Egypt. Some of these cu.docxOllieShoresna
 
This assignment has two goals 1) have students increase their under.docx
This assignment has two goals 1) have students increase their under.docxThis assignment has two goals 1) have students increase their under.docx
This assignment has two goals 1) have students increase their under.docxOllieShoresna
 
This assignment has two parts 1 paragraph per questionIn wh.docx
This assignment has two parts 1 paragraph per questionIn wh.docxThis assignment has two parts 1 paragraph per questionIn wh.docx
This assignment has two parts 1 paragraph per questionIn wh.docxOllieShoresna
 
This assignment is a minimum of 100 word all parts of each querstion.docx
This assignment is a minimum of 100 word all parts of each querstion.docxThis assignment is a minimum of 100 word all parts of each querstion.docx
This assignment is a minimum of 100 word all parts of each querstion.docxOllieShoresna
 
This assignment has three elements a traditional combination format.docx
This assignment has three elements a traditional combination format.docxThis assignment has three elements a traditional combination format.docx
This assignment has three elements a traditional combination format.docxOllieShoresna
 
This assignment has four partsWhat changes in business software p.docx
This assignment has four partsWhat changes in business software p.docxThis assignment has four partsWhat changes in business software p.docx
This assignment has four partsWhat changes in business software p.docxOllieShoresna
 
This assignment consists of two partsthe core evaluation, a.docx
This assignment consists of two partsthe core evaluation, a.docxThis assignment consists of two partsthe core evaluation, a.docx
This assignment consists of two partsthe core evaluation, a.docxOllieShoresna
 
This assignment asks you to analyze a significant textual elemen.docx
This assignment asks you to analyze a significant textual elemen.docxThis assignment asks you to analyze a significant textual elemen.docx
This assignment asks you to analyze a significant textual elemen.docxOllieShoresna
 
This assignment allows you to learn more about one key person in Jew.docx
This assignment allows you to learn more about one key person in Jew.docxThis assignment allows you to learn more about one key person in Jew.docx
This assignment allows you to learn more about one key person in Jew.docxOllieShoresna
 
This assignment allows you to explore the effects of social influe.docx
This assignment allows you to explore the effects of social influe.docxThis assignment allows you to explore the effects of social influe.docx
This assignment allows you to explore the effects of social influe.docxOllieShoresna
 
This assignment addresses pretrial procedures that occur prior to th.docx
This assignment addresses pretrial procedures that occur prior to th.docxThis assignment addresses pretrial procedures that occur prior to th.docx
This assignment addresses pretrial procedures that occur prior to th.docxOllieShoresna
 
This assignment allows you to learn more about one key person in J.docx
This assignment allows you to learn more about one key person in J.docxThis assignment allows you to learn more about one key person in J.docx
This assignment allows you to learn more about one key person in J.docxOllieShoresna
 
This assignment allows you to explore the effects of social infl.docx
This assignment allows you to explore the effects of social infl.docxThis assignment allows you to explore the effects of social infl.docx
This assignment allows you to explore the effects of social infl.docxOllieShoresna
 
this about communication please i eant you answer this question.docx
this about communication please i eant you answer this question.docxthis about communication please i eant you answer this question.docx
this about communication please i eant you answer this question.docxOllieShoresna
 
Think of a time when a company did not process an order or perform a.docx
Think of a time when a company did not process an order or perform a.docxThink of a time when a company did not process an order or perform a.docx
Think of a time when a company did not process an order or perform a.docxOllieShoresna
 
Think_Vision W5- Importance of VaccinationImportance of Vaccinatio.docx
Think_Vision W5- Importance of VaccinationImportance of Vaccinatio.docxThink_Vision W5- Importance of VaccinationImportance of Vaccinatio.docx
Think_Vision W5- Importance of VaccinationImportance of Vaccinatio.docxOllieShoresna
 
Thinks for both only 50 words as much for each one1-xxxxd, unf.docx
Thinks for both only 50 words as much for each one1-xxxxd, unf.docxThinks for both only 50 words as much for each one1-xxxxd, unf.docx
Thinks for both only 50 words as much for each one1-xxxxd, unf.docxOllieShoresna
 
Think of a specific change you would like to bring to your organizat.docx
Think of a specific change you would like to bring to your organizat.docxThink of a specific change you would like to bring to your organizat.docx
Think of a specific change you would like to bring to your organizat.docxOllieShoresna
 
Think of a possible change initiative in your selected organization..docx
Think of a possible change initiative in your selected organization..docxThink of a possible change initiative in your selected organization..docx
Think of a possible change initiative in your selected organization..docxOllieShoresna
 
Thinking About Research PaperConsider the research question and .docx
Thinking About Research PaperConsider the research question and .docxThinking About Research PaperConsider the research question and .docx
Thinking About Research PaperConsider the research question and .docxOllieShoresna
 

More from OllieShoresna (20)

this assignment is about Mesopotamia and Egypt. Some of these cu.docx
this assignment is about Mesopotamia and Egypt. Some of these cu.docxthis assignment is about Mesopotamia and Egypt. Some of these cu.docx
this assignment is about Mesopotamia and Egypt. Some of these cu.docx
 
This assignment has two goals 1) have students increase their under.docx
This assignment has two goals 1) have students increase their under.docxThis assignment has two goals 1) have students increase their under.docx
This assignment has two goals 1) have students increase their under.docx
 
This assignment has two parts 1 paragraph per questionIn wh.docx
This assignment has two parts 1 paragraph per questionIn wh.docxThis assignment has two parts 1 paragraph per questionIn wh.docx
This assignment has two parts 1 paragraph per questionIn wh.docx
 
This assignment is a minimum of 100 word all parts of each querstion.docx
This assignment is a minimum of 100 word all parts of each querstion.docxThis assignment is a minimum of 100 word all parts of each querstion.docx
This assignment is a minimum of 100 word all parts of each querstion.docx
 
This assignment has three elements a traditional combination format.docx
This assignment has three elements a traditional combination format.docxThis assignment has three elements a traditional combination format.docx
This assignment has three elements a traditional combination format.docx
 
This assignment has four partsWhat changes in business software p.docx
This assignment has four partsWhat changes in business software p.docxThis assignment has four partsWhat changes in business software p.docx
This assignment has four partsWhat changes in business software p.docx
 
This assignment consists of two partsthe core evaluation, a.docx
This assignment consists of two partsthe core evaluation, a.docxThis assignment consists of two partsthe core evaluation, a.docx
This assignment consists of two partsthe core evaluation, a.docx
 
This assignment asks you to analyze a significant textual elemen.docx
This assignment asks you to analyze a significant textual elemen.docxThis assignment asks you to analyze a significant textual elemen.docx
This assignment asks you to analyze a significant textual elemen.docx
 
This assignment allows you to learn more about one key person in Jew.docx
This assignment allows you to learn more about one key person in Jew.docxThis assignment allows you to learn more about one key person in Jew.docx
This assignment allows you to learn more about one key person in Jew.docx
 
This assignment allows you to explore the effects of social influe.docx
This assignment allows you to explore the effects of social influe.docxThis assignment allows you to explore the effects of social influe.docx
This assignment allows you to explore the effects of social influe.docx
 
This assignment addresses pretrial procedures that occur prior to th.docx
This assignment addresses pretrial procedures that occur prior to th.docxThis assignment addresses pretrial procedures that occur prior to th.docx
This assignment addresses pretrial procedures that occur prior to th.docx
 
This assignment allows you to learn more about one key person in J.docx
This assignment allows you to learn more about one key person in J.docxThis assignment allows you to learn more about one key person in J.docx
This assignment allows you to learn more about one key person in J.docx
 
This assignment allows you to explore the effects of social infl.docx
This assignment allows you to explore the effects of social infl.docxThis assignment allows you to explore the effects of social infl.docx
This assignment allows you to explore the effects of social infl.docx
 
this about communication please i eant you answer this question.docx
this about communication please i eant you answer this question.docxthis about communication please i eant you answer this question.docx
this about communication please i eant you answer this question.docx
 
Think of a time when a company did not process an order or perform a.docx
Think of a time when a company did not process an order or perform a.docxThink of a time when a company did not process an order or perform a.docx
Think of a time when a company did not process an order or perform a.docx
 
Think_Vision W5- Importance of VaccinationImportance of Vaccinatio.docx
Think_Vision W5- Importance of VaccinationImportance of Vaccinatio.docxThink_Vision W5- Importance of VaccinationImportance of Vaccinatio.docx
Think_Vision W5- Importance of VaccinationImportance of Vaccinatio.docx
 
Thinks for both only 50 words as much for each one1-xxxxd, unf.docx
Thinks for both only 50 words as much for each one1-xxxxd, unf.docxThinks for both only 50 words as much for each one1-xxxxd, unf.docx
Thinks for both only 50 words as much for each one1-xxxxd, unf.docx
 
Think of a specific change you would like to bring to your organizat.docx
Think of a specific change you would like to bring to your organizat.docxThink of a specific change you would like to bring to your organizat.docx
Think of a specific change you would like to bring to your organizat.docx
 
Think of a possible change initiative in your selected organization..docx
Think of a possible change initiative in your selected organization..docxThink of a possible change initiative in your selected organization..docx
Think of a possible change initiative in your selected organization..docx
 
Thinking About Research PaperConsider the research question and .docx
Thinking About Research PaperConsider the research question and .docxThinking About Research PaperConsider the research question and .docx
Thinking About Research PaperConsider the research question and .docx
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

Data Mining Association Analysis Basic Concepts a