Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining Association Rules in Large Database

Mining Association Rules in Large Databases , Apriori Algorithm,Association Rule Mining

  • Login to see the comments

Mining Association Rules in Large Database

  1. 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Mining Association Rules in Large Databases Chapter 7:
  2. 2. Introduction Association rule mining finds interesting association or correlation relationships among a large set of data items. With massive amounts of data continuously being collected and stored , many industries are becoming interested in mining association huge amounts of business transaction records can help in many business decision making processes, such as catalog design, cross-marketing, and loss-leader analysis.  A typical example of association rule mining is market basket analysis.
  3. 3. Association Rules  Analyzes and predicts customer behavior.  If / then statements.  Examples:  Bread=>butter. If someone purchase bread then he/she likely to purchase butter. Buys{onions, potatoes}=> buys{tomatoes}
  4. 4. Parts of Association Rules Bread=>butter[20%, 45%] Bread: Antecedent Butter: Consequent 20% is Support And 45% is Confidence
  5. 5. Support and Confidence A=>B Support denoted probability that contains both A & B Confidence denotes probability that a transaction containing A also contains B.
  6. 6. Support and Confidence Consider in a super market Total transcations: 100 Bread: 20 So , 20/100 * 100=20% which is support In 20 transaction of bread, butter : 9 transactions So, 9/20 * 100=45% which is confidence.
  7. 7. Types of Association Rules Single dimension association rule Multidimensional association rule Hybrid association rule
  8. 8. Single dimension association rule Bread=>Butter Dimension: buying. Here one and only dimension is buying.
  9. 9. Multi dimension association rule  With 2 or more dimensions.  Occupation(I.T), Age(>22)=>buys(laptops)  Here we have 3 dimensions i.e occupation, age limit and buys.  In multidimensional rules we can not duplicate dimension.
  10. 10. Hybrid dimension association rule  Dimension or predicates can be repeated.  Time(5 O'clock ), Buy(tea)=>Buy(biscuits)  If a person at 5 o’clock get tea, he or she is likely to get biscuits also.  Here dimensions are repeated.
  11. 11. Field of association rule  Web usages mining  Banking  Bio informatics  Market based analysis  Credit/ debit card analysis  Product clustering  Catalog design
  12. 12. Algorithms of association rule  Apriori Algorithm  Elcat Algorithm  F.P Growth Algorithm
  13. 13. Apriori Algorithm  If you brought tooth brush, there will be suggestion of tooth paste or if you brought beer there will be suggestion of chips and potato cracker etc.  Many ecommerce websites are using these trends of suggestion in market. This is called Apriori Algorithms. This is machine learning algorithms and a lot of ecommerce websites (like flipcart, amazon) are using this.
  14. 14. Apriori Algorithm
  15. 15. Apriori Algorithm Candidates First C1: Item Set Support Count M 3 O 4 N 2 K 5 E 4 Y 3 D 1 A 1 U 1 C 2
  16. 16. Apriori Algorithm L1: (The item set which are frequently repeating using minimum support) Item Set Support Count M 3 O 4 K 5 E 4 Y 3
  17. 17. Apriori Algorithm Candidates First C2: Item Set Support Count M, O 1 M, K 3 M, E 2 M,Y 2 O, K 3 O, E 3 O, Y 2 K, E 4 K, Y 3 E, Y 2
  18. 18. Apriori Algorithm L2: (The item set which are frequently repeating using minimum support) Item Set Support Count M, K 3 O, K 3 O, E 3 K, E 4 K, Y 3
  19. 19. Apriori Algorithm Candidates First C3: Item Set Support Count M, K, O 1 M, K, E 2 M, K, Y 2 O, K, E 3 O, K, Y 2
  20. 20. Apriori Algorithm L3: (The item set which are frequently repeating using minimum support) Item Set Support Count O, K, E 3
  21. 21. Apriori Algorithm Now create association rules with support and confidence for O, K, E. Association rules as like O AND K GIVES E Confidence= (support/no of time it occur i.e. O AND K OF O^K=>E) For example confidence for o and k = (3/3)=1 Association Rule Support Confidence Confidence % O^K=>E 3 3/3=1 100 O^E=>K 3 3/3=1 100 K^E=>O 3 3/4=0.75 75 E=>O^K 3 3/4=0.75 75 K=>O^E 3 3/5=0.6 60 O=>K^E 3 3/4=0.75 75
  22. 22. Apriori Algorithm Compare this with the minimum confidence 80% Association Rule Support Confidence Confidence % O^K=>E 3 3/3=1 100 O^E=>K 3 3/3=1 100 Hence final association rules are: O^K=>E O^E=>K Now this is called market basket analysis.
  23. 23. Pros and Cons of Association Rule Mining Pros  It is an easy-to-implement and easy-to-understand algorithm.  It can be used on large itemsets. Cons  Sometimes, it may need to find a large number of candidate rules which can be computationally expensive.  Calculating support is also expensive because it has to go through the entire database. June 8, 2019 Data Mining: Concepts and Techniques 23
  24. 24. Assignment Minimum support:2, Minimum confidence:70%. Use Apriori algorithm to get frequent itemsets and strong association rules. TID Item 1 I1, I3, I4 2 I2, I3, I5 3 I1, I2, I3, I5 4 I2, I5
  25. 25. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3
  26. 26. ANY QUESTIONS?