Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Data Mining: clustering and analysis by DataminingTools Inc 22862 views
- Belief Networks & Bayesian Classifi... by Adnan Masood 11343 views
- Chap8 basic cluster_analysis by guru_prasadg 7224 views
- Clustering: A Survey by Raffaele Capaldo 16841 views
- Bayesian Networks - A Brief Introdu... by Adnan Masood 14572 views
- Bayesian Belief Networks for dummies by Gilad Barkan 50985 views

13,358 views

Published on

Introduction to Association Analysis

No Downloads

Total views

13,358

On SlideShare

0

From Embeds

0

Number of Embeds

11

Shares

0

Downloads

0

Comments

0

Likes

7

No embeds

No notes for slide

- 1. Association Analysis<br />
- 2. Association Analysis-Definition<br />Association Analysis is the task of uncovering relationships among data.<br />Association rules:<br />It is a model that identifies how the data items are associated with each other.<br />Ex:<br /> It is used in retail sales to identify that are frequently purchased together.<br />
- 3. What is a rule? <br /><ul><li>Structure of rule:</li></ul>If (condition) then (result) <br />Example: IF a customer purchases coke, then the customer also purchases orange juice <br />The first part is the rule body and the second part is the rule head <br />
- 4. Strength of a rule <br />How certain is the rule? <br />Confidence measures the certainty of a rule <br />It is the percentage of transactions containing all items stated in the condition that also contain the items in result <br />Confidence (A ,B) = P(B | A) <br />Example: The rule "If Coke then Oranje Juice" has a confidence of 100% <br />
- 5. Strength of a rule <br />How often is the rule occurred? <br />Support measures the usefulness of a rule <br />It is the percentage of transactions that contains all items in the rule <br />Support (A , B) = P(A ,B) <br />Example: For the rule If Coke then Oranj juice <br />In all 5 transactions, 2 contains both coke and OJ <br />The support of the rule is 40% <br />
- 6. Association Rule Mining<br />Two-step process <br />Find all frequent k-item sets, k=1, 2, 3, … <br />All items in a rule is referred as an itemset<br />Rules that contains k item forms a k-itemset<br />The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset<br />An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset<br />
- 7. Association Rule Mining<br />2.Generate strong association rules from the frequent k-itemsets<br />Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules<br />
- 8. Apriori Algorithm: Find all frequent k-item sets<br />Apriori principle:<br />If an itemset is frequent, then all of its subsets must also be frequent<br />
- 9. Illustrating Apriori Principle<br />
- 10. Apriori Algorithm<br />Method: <br />Let k=1<br />Generate frequent itemsets of length 1<br />Repeat until no new frequent itemsets are identified<br />Generate length (k+1) candidate itemsets from length k frequent itemsets<br />
- 11. Contd…<br />Prune candidate itemsets containing subsets of length k that are infrequent <br />Count the support of each candidate by scanning the DB<br />Eliminate candidates that are infrequent, leaving only those that are frequent<br />
- 12. Generate strong association rules from the frequent k-itemsets<br />For each frequent k-itemset, generate all non-empty subsets <br />Fore every nonempty subset, generate the rule and the associated confidence <br />Output the rule if the minimum confidence threshold is satisfied <br />
- 13. Multilevel association rules<br />Difficult to find strong associations at very low or primitive levels of data <br /> <br />Few people may buy "IBM desktop computer" and "Sony b/w printer" together <br />Many people may purchase "computer" and "printer" together <br />
- 14. Concept hierarchy<br />defines a sequence of mappings from a set of low level concepts to higher level<br />EX: IBM <br /> Microsoft Hp<br /> ………<br /> computer software printer accessory <br />
- 15. Steps to be followed<br />Top-down, progressive deepening approach <br />First mine high-level frequent items <br />Then mine their lower level frequent items and so on <br />At each level, Apriori algorithm is used <br />Use uniform minimum support for all levels, or <br />Use reduced minimum support at lower levels <br />
- 16. Sequential Association Rule <br />Concerns sequences of events <br />New homeowners purchase shower curtains before purchasing furniture <br />When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts <br />
- 17. Sequential Association Rule <br />Transaction must have two additional features: <br />a time stamp or sequencing information to determine when transactions occurred relative to each other <br />identifying information, such as account number or id number <br />
- 18. Some important parameters <br />Duration <br />duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999 <br />Event folding window <br />a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.<br />
- 19. Some important parameters <br />Interval <br />between events in the discovered pattern <br />0 interval means to find strictly consecutive sequences <br />min_int <= interval <= max_int means to find patterns that are separated by at least min_int at most max_int<br />interval = c, to find patterns carrying an exact interval <br />
- 20. Some Practical Issues <br />Time window of transactions <br />Level of aggregation <br />Level of support and confidence <br />
- 21. Time window of transactions <br />Select a time window for the transaction covers at least 2 product cycles <br />e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data <br />For frequently purchased products, a short time window is sufficient <br />For low frequency items, a longer time window is necessary.<br />
- 22. Level of aggregation <br />If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered <br />Group products into categories according to the product hierarchy or create new level manually <br />
- 23. Level of support and confidence <br />Start with a high support and gradually reduce it <br />Set confidence to around 50% to reduce the number of permutation <br />
- 24. Conclusion<br />Association analysis rules such as multidimensional and sequential association rules are studied.<br />Apriori algorithm is described in detail<br />Various practical issues in association rules are analyzed.<br />
- 25. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

No public clipboards found for this slide

Be the first to comment