Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Association Analysis

680

Published on

Association Analysis

Association Analysis

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
680
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Association Analysis
• 2. Association Analysis-Definition
Association Analysis is the task of uncovering relationships among data.
Association rules:
It is a model that identifies how the data items are associated with each other.
Ex:
It is used in retail sales to identify that are frequently purchased together.
• 3. What is a rule?
• Structure of rule:
If (condition) then (result)
Example: IF a customer purchases coke, then the customer also purchases orange juice
The first part is the rule body and the second part is the rule head
• 4. Strength of a rule
How certain is the rule?
Confidence measures the certainty of a rule
It is the percentage of transactions containing all items stated in the condition that also contain the items in result
Confidence (A ,B) = P(B | A)
Example: The rule &quot;If Coke then Oranje Juice&quot; has a confidence of 100%
• 5. Strength of a rule
How often is the rule occurred?
Support measures the usefulness of a rule
It is the percentage of transactions that contains all items in the rule
Support (A , B) = P(A ,B)
Example: For the rule If Coke then Oranj juice
In all 5 transactions, 2 contains both coke and OJ
The support of the rule is 40%
• 6. Association Rule Mining
Two-step process
Find all frequent k-item sets, k=1, 2, 3, …
All items in a rule is referred as an itemset
Rules that contains k item forms a k-itemset
The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset
An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset
• 7. Association Rule Mining
2.Generate strong association rules from the frequent k-itemsets
Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules
• 8. Apriori Algorithm: Find all frequent k-item sets
Apriori principle:
If an itemset is frequent, then all of its subsets must also be frequent
• 9. Illustrating Apriori Principle
• 10. Apriori Algorithm
Method:
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k frequent itemsets
• 11. Contd…
Prune candidate itemsets containing subsets of length k that are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those that are frequent
• 12. Generate strong association rules from the frequent k-itemsets
For each frequent k-itemset, generate all non-empty subsets
Fore every nonempty subset, generate the rule and the associated confidence
Output the rule if the minimum confidence threshold is satisfied
• 13. Multilevel association rules
Difficult to find strong associations at very low or primitive levels of data

Few people may buy &quot;IBM desktop computer&quot; and &quot;Sony b/w printer&quot; together
Many people may purchase &quot;computer&quot; and &quot;printer&quot; together
• 14. Concept hierarchy
defines a sequence of mappings from a set of low level concepts to higher level
EX: IBM
Microsoft  Hp
………
computer  software  printer  accessory
• 15. Steps to be followed
Top-down, progressive deepening approach
First mine high-level frequent items
Then mine their lower level frequent items and so on
At each level, Apriori algorithm is used
Use uniform minimum support for all levels, or
Use reduced minimum support at lower levels
• 16. Sequential Association Rule
Concerns sequences of events
New homeowners purchase shower curtains before purchasing furniture
When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts
• 17. Sequential Association Rule
Transaction must have two additional features:
a time stamp or sequencing information to determine when transactions occurred relative to each other
identifying information, such as account number or id number
• 18. Some important parameters
Duration
duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999
Event folding window
a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.
• 19. Some important parameters
Interval
between events in the discovered pattern
0 interval means to find strictly consecutive sequences
min_int &lt;= interval &lt;= max_int means to find patterns that are separated by at least min_int at most max_int
interval = c, to find patterns carrying an exact interval
• 20. Some Practical Issues
Time window of transactions
Level of aggregation
Level of support and confidence
• 21. Time window of transactions
Select a time window for the transaction covers at least 2 product cycles
e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data
For frequently purchased products, a short time window is sufficient
For low frequency items, a longer time window is necessary.
• 22. Level of aggregation
If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered
Group products into categories according to the product hierarchy or create new level manually
• 23. Level of support and confidence