• Save
Association Analysis
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Association Analysis

on

  • 999 views

Association Analysis

Association Analysis

Statistics

Views

Total Views
999
Views on SlideShare
973
Embed Views
26

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 26

http://www.dataminingtools.net 13
http://dataminingtools.net 11
http://www.slideshare.net 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Association Analysis Presentation Transcript

  • 1. Association Analysis
  • 2. Association Analysis-Definition
    Association Analysis is the task of uncovering relationships among data.
    Association rules:
    It is a model that identifies how the data items are associated with each other.
    Ex:
    It is used in retail sales to identify that are frequently purchased together.
  • 3. What is a rule?
    • Structure of rule:
    If (condition) then (result)
    Example: IF a customer purchases coke, then the customer also purchases orange juice
    The first part is the rule body and the second part is the rule head
  • 4. Strength of a rule
    How certain is the rule?
    Confidence measures the certainty of a rule
    It is the percentage of transactions containing all items stated in the condition that also contain the items in result
    Confidence (A ,B) = P(B | A)
    Example: The rule "If Coke then Oranje Juice" has a confidence of 100%
  • 5. Strength of a rule
    How often is the rule occurred?
    Support measures the usefulness of a rule
    It is the percentage of transactions that contains all items in the rule
    Support (A , B) = P(A ,B)
    Example: For the rule If Coke then Oranj juice
    In all 5 transactions, 2 contains both coke and OJ
    The support of the rule is 40% 
  • 6. Association Rule Mining
    Two-step process
    Find all frequent k-item sets, k=1, 2, 3, …
    All items in a rule is referred as an itemset
    Rules that contains k item forms a k-itemset
    The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset
    An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset
  • 7. Association Rule Mining
    2.Generate strong association rules from the frequent k-itemsets
    Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules
  • 8. Apriori Algorithm: Find all frequent k-item sets
    Apriori principle:
    If an itemset is frequent, then all of its subsets must also be frequent
  • 9. Illustrating Apriori Principle
  • 10. Apriori Algorithm
    Method:
    Let k=1
    Generate frequent itemsets of length 1
    Repeat until no new frequent itemsets are identified
    Generate length (k+1) candidate itemsets from length k frequent itemsets
  • 11. Contd…
    Prune candidate itemsets containing subsets of length k that are infrequent
    Count the support of each candidate by scanning the DB
    Eliminate candidates that are infrequent, leaving only those that are frequent
  • 12. Generate strong association rules from the frequent k-itemsets
    For each frequent k-itemset, generate all non-empty subsets
    Fore every nonempty subset, generate the rule and the associated confidence
    Output the rule if the minimum confidence threshold is satisfied
  • 13. Multilevel association rules
    Difficult to find strong associations at very low or primitive levels of data
     
    Few people may buy "IBM desktop computer" and "Sony b/w printer" together
    Many people may purchase "computer" and "printer" together
  • 14. Concept hierarchy
    defines a sequence of mappings from a set of low level concepts to higher level
    EX: IBM 
    Microsoft  Hp
    ………
    computer  software  printer  accessory 
  • 15. Steps to be followed
    Top-down, progressive deepening approach
    First mine high-level frequent items
    Then mine their lower level frequent items and so on
    At each level, Apriori algorithm is used
    Use uniform minimum support for all levels, or
    Use reduced minimum support at lower levels
  • 16. Sequential Association Rule 
    Concerns sequences of events
    New homeowners purchase shower curtains before purchasing furniture
    When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts
  • 17. Sequential Association Rule 
    Transaction must have two additional features:
    a time stamp or sequencing information to determine when transactions occurred relative to each other
    identifying information, such as account number or id number
  • 18. Some important parameters
    Duration
    duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999
    Event folding window
    a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.
  • 19. Some important parameters
    Interval
    between events in the discovered pattern
    0 interval means to find strictly consecutive sequences
    min_int <= interval <= max_int means to find patterns that are separated by at least min_int at most max_int
    interval = c, to find patterns carrying an exact interval
  • 20. Some Practical Issues 
    Time window of transactions
    Level of aggregation
    Level of support and confidence
  • 21. Time window of transactions
    Select a time window for the transaction covers at least 2 product cycles
    e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data
    For frequently purchased products, a short time window is sufficient
    For low frequency items, a longer time window is necessary.
  • 22. Level of aggregation
    If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered
    Group products into categories according to the product hierarchy or create new level manually
  • 23. Level of support and confidence
    Start with a high support and gradually reduce it
    Set confidence to around 50% to reduce the number of permutation
  • 24. Conclusion
    Association analysis rules such as multidimensional and sequential association rules are studied.
    Apriori algorithm is described in detail
    Various practical issues in association rules are analyzed.
  • 25. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at www.dataminingtools.net