Upcoming SlideShare
×

# Association Analysis

888 views

Published on

Association Analysis

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
888
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Association Analysis

1. 1. Association Analysis<br />
2. 2. Association Analysis-Definition<br />Association Analysis is the task of uncovering relationships among data.<br />Association rules:<br />It is a model that identifies how the data items are associated with each other.<br />Ex:<br /> It is used in retail sales to identify that are frequently purchased together.<br />
3. 3. What is a rule? <br /><ul><li>Structure of rule:</li></ul>If (condition) then (result) <br />Example: IF a customer purchases coke, then the customer also purchases orange juice <br />The first part is the rule body and the second part is the rule head <br />
4. 4. Strength of a rule <br />How certain is the rule? <br />Confidence measures the certainty of a rule <br />It is the percentage of transactions containing all items stated in the condition that also contain the items in result <br />Confidence (A ,B) = P(B | A) <br />Example: The rule &quot;If Coke then Oranje Juice&quot; has a confidence of 100% <br />
5. 5. Strength of a rule <br />How often is the rule occurred? <br />Support measures the usefulness of a rule <br />It is the percentage of transactions that contains all items in the rule <br />Support (A , B) = P(A ,B) <br />Example: For the rule If Coke then Oranj juice <br />In all 5 transactions, 2 contains both coke and OJ <br />The support of the rule is 40% <br />
6. 6. Association Rule Mining<br />Two-step process <br />Find all frequent k-item sets, k=1, 2, 3, … <br />All items in a rule is referred as an itemset<br />Rules that contains k item forms a k-itemset<br />The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset<br />An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset<br />
7. 7. Association Rule Mining<br />2.Generate strong association rules from the frequent k-itemsets<br />Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules<br />
8. 8. Apriori Algorithm: Find all frequent k-item sets<br />Apriori principle:<br />If an itemset is frequent, then all of its subsets must also be frequent<br />
9. 9. Illustrating Apriori Principle<br />
10. 10. Apriori Algorithm<br />Method: <br />Let k=1<br />Generate frequent itemsets of length 1<br />Repeat until no new frequent itemsets are identified<br />Generate length (k+1) candidate itemsets from length k frequent itemsets<br />
11. 11. Contd…<br />Prune candidate itemsets containing subsets of length k that are infrequent <br />Count the support of each candidate by scanning the DB<br />Eliminate candidates that are infrequent, leaving only those that are frequent<br />
12. 12. Generate strong association rules from the frequent k-itemsets<br />For each frequent k-itemset, generate all non-empty subsets <br />Fore every nonempty subset, generate the rule and the associated confidence <br />Output the rule if the minimum confidence threshold is satisfied <br />
13. 13. Multilevel association rules<br />Difficult to find strong associations at very low or primitive levels of data <br /> <br />Few people may buy &quot;IBM desktop computer&quot; and &quot;Sony b/w printer&quot; together <br />Many people may purchase &quot;computer&quot; and &quot;printer&quot; together <br />
14. 14. Concept hierarchy<br />defines a sequence of mappings from a set of low level concepts to higher level<br />EX: IBM <br /> Microsoft  Hp<br /> ………<br /> computer  software  printer  accessory <br />
15. 15. Steps to be followed<br />Top-down, progressive deepening approach <br />First mine high-level frequent items <br />Then mine their lower level frequent items and so on <br />At each level, Apriori algorithm is used <br />Use uniform minimum support for all levels, or <br />Use reduced minimum support at lower levels <br />
16. 16. Sequential Association Rule <br />Concerns sequences of events <br />New homeowners purchase shower curtains before purchasing furniture <br />When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts <br />
17. 17. Sequential Association Rule <br />Transaction must have two additional features: <br />a time stamp or sequencing information to determine when transactions occurred relative to each other <br />identifying information, such as account number or id number <br />
18. 18. Some important parameters <br />Duration <br />duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999 <br />Event folding window <br />a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.<br />
19. 19. Some important parameters <br />Interval <br />between events in the discovered pattern <br />0 interval means to find strictly consecutive sequences <br />min_int &lt;= interval &lt;= max_int means to find patterns that are separated by at least min_int at most max_int<br />interval = c, to find patterns carrying an exact interval <br />
20. 20. Some Practical Issues <br />Time window of transactions <br />Level of aggregation <br />Level of support and confidence <br />
21. 21. Time window of transactions <br />Select a time window for the transaction covers at least 2 product cycles <br />e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data <br />For frequently purchased products, a short time window is sufficient <br />For low frequency items, a longer time window is necessary.<br />
22. 22. Level of aggregation <br />If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered <br />Group products into categories according to the product hierarchy or create new level manually <br />
23. 23. Level of support and confidence <br />Start with a high support and gradually reduce it <br />Set confidence to around 50% to reduce the number of permutation <br />
24. 24. Conclusion<br />Association analysis rules such as multidimensional and sequential association rules are studied.<br />Apriori algorithm is described in detail<br />Various practical issues in association rules are analyzed.<br />
25. 25. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />