M.SANDHIYA (MSC INFO TECH)
NADAR SARASWATHI COLLEGE OF ARTS AND
SCIENCE.
Association Mining
 Association rule mining:
 Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
 Frequent pattern: pattern (set of items, sequence, etc.)
that occurs frequently in a database
 Motivation: finding regularities in data
Mining Association Rules—an Example
For rule A  C:
support = support({A}{C}) = 50%
confidence =
support({A}{C})/support({A}) = 66.6%
Transactio
n-id
Items
bought
10 A, B, C
20 A, C
30 A, D
40 B, E, F
Frequent pattern Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
Mining Association Rules in Large Databases
 Association rule mining
 Algorithms for scalable mining of (single-dimensional
Boolean) association rules in transactional databases
 Mining various kinds of association/correlation rules
 Applications/extensions of frequent pattern mining
 Summary
Frequent-pattern Mining
 Multiple database scans are costly
 Mining long patterns needs many passes of scanning and
generates lots of candidates
 To find frequent itemset i1i2…i100
 # of scans: 100
 # of Candidates: + + … + = 2100-1 = 1.27*1030 !
 Bottleneck: candidate-generation-and-test
 Can we avoid candidate generation
frequent itemset
 Any subset of a frequent itemset must be frequent
 if {beer, diaper, nuts} is frequent, so is {beer, diaper}
 every transaction having {beer, diaper, nuts} also contains {beer,
diaper}
 Apriori pruning principle: If there is any itemset which is infrequent,
its superset should not be generated/tested!
 Method:
 generate length (k+1) candidate itemsets from length k
frequent itemsets, and
 test the candidates against DB
 The performance studies show its efficiency and scalability
Partition Algorithm
Executes in two phases:
Phase I:
 logically divide the database into a number of
non-overlapping partitions
 generate all large itemsets for each partition
 merge all these large itemsets into one set of all
potentially large itemsets
Phase II:
 generate the actual support for these itemsets
and identify the large itemsets
Other Algorithms for Association Rules
Mining
 Partition Algorithm
 Sampling Method
 Dynamic Itemset Counting
Mining Frequent Patterns With FP-trees
 Idea: Frequent pattern growth
 Recursively grow frequent patterns by pattern and
database partition
 Method
 For each frequent item, construct its conditional
pattern-base, and then its conditional FP-tree
 Repeat the process on each newly created conditional
FP-tree
 Until the resulting FP-tree is empty, or it contains only
one path—single path will generate all the combinations
of its sub-paths, each of which is a frequent pattern
Find Patterns Having P From P-conditional
Database
 Starting at the frequent item header table in the
FP-tree
 Traverse the FP-tree by following the link of each
frequent item p
 Accumulate all of transformed prefix paths of item
p to form p’s conditional pattern base

Mining frequent patterns association

  • 1.
    M.SANDHIYA (MSC INFOTECH) NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE.
  • 2.
    Association Mining  Associationrule mining:  Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.  Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database  Motivation: finding regularities in data
  • 3.
    Mining Association Rules—anExample For rule A  C: support = support({A}{C}) = 50% confidence = support({A}{C})/support({A}) = 66.6% Transactio n-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Frequent pattern Support {A} 75% {B} 50% {C} 50% {A, C} 50%
  • 4.
    Mining Association Rulesin Large Databases  Association rule mining  Algorithms for scalable mining of (single-dimensional Boolean) association rules in transactional databases  Mining various kinds of association/correlation rules  Applications/extensions of frequent pattern mining  Summary
  • 5.
    Frequent-pattern Mining  Multipledatabase scans are costly  Mining long patterns needs many passes of scanning and generates lots of candidates  To find frequent itemset i1i2…i100  # of scans: 100  # of Candidates: + + … + = 2100-1 = 1.27*1030 !  Bottleneck: candidate-generation-and-test  Can we avoid candidate generation
  • 6.
    frequent itemset  Anysubset of a frequent itemset must be frequent  if {beer, diaper, nuts} is frequent, so is {beer, diaper}  every transaction having {beer, diaper, nuts} also contains {beer, diaper}  Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested!  Method:  generate length (k+1) candidate itemsets from length k frequent itemsets, and  test the candidates against DB  The performance studies show its efficiency and scalability
  • 7.
    Partition Algorithm Executes intwo phases: Phase I:  logically divide the database into a number of non-overlapping partitions  generate all large itemsets for each partition  merge all these large itemsets into one set of all potentially large itemsets Phase II:  generate the actual support for these itemsets and identify the large itemsets
  • 8.
    Other Algorithms forAssociation Rules Mining  Partition Algorithm  Sampling Method  Dynamic Itemset Counting
  • 9.
    Mining Frequent PatternsWith FP-trees  Idea: Frequent pattern growth  Recursively grow frequent patterns by pattern and database partition  Method  For each frequent item, construct its conditional pattern-base, and then its conditional FP-tree  Repeat the process on each newly created conditional FP-tree  Until the resulting FP-tree is empty, or it contains only one path—single path will generate all the combinations of its sub-paths, each of which is a frequent pattern
  • 10.
    Find Patterns HavingP From P-conditional Database  Starting at the frequent item header table in the FP-tree  Traverse the FP-tree by following the link of each frequent item p  Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base