SlideShare a Scribd company logo
1 of 34
Mining frequent patterns and associations
What is Concept Description?
• As we know that data mining functionalities can be classified into two categories:
1. Descriptive
2. Predictive
▪ Descriptive
• This task presents the general properties of data stored in a database.
• The descriptive tasks are used to find out patterns in data.
• E.g.: Cluster, Trends, etc.
▪ Predictive
• These tasks predict the value of one attribute on the basis of values of other attributes.
• E.g.: Festival Customer/Product Sell prediction at store
What is Concept Description? Cont..
• Descriptive data mining describes the data set in a concise and
summative manner and presents interesting general properties of the
data.
• Predictive data mining analyzes the data in order to construct one or a
set of models, and attempts to predict the behavior of new data sets.
• Database is usually storing the large amounts of data in great detail.
However users often like to view sets of summarized data in concise,
descriptive terms.
• A concept usually refers to a collection of concise or summarized data
such as grade wise students, products selling on festival etc.
What is Concept Description? Cont..
• Concept description generates descriptions for characterization and
comparison of the data, it is also called class description.
• Characterization provides a concise and brief summarization of the data.
• While concept or class comparison (also known as discrimination)
provides discriminations (inequity) comparing two or more collections of
data.
• Example
• Given the ABC Company database, for example, examining individual customer
transactions.
• Sales managers may prefer to view the data generalized to higher levels, such as
summarized by customer groups according to geographic regions, frequency of
purchases per group and customer income.
The Problem
When we go grocery shopping, we often have a standard list of things to buy. Each shopper
has a distinctive list, depending on one’s needs and preferences. A housewife might buy
healthy ingredients for a family dinner, while a bachelor might buy beer and chips.
Understanding these buying patterns can help to increase sales in several ways. If there is a
pair of items, X and Y, that are frequently bought together:
Both X and Y can be placed on the same shelf, so that buyers of one item would be
prompted to buy the other.
Promotional discounts could be applied to just one out of the two items.
Advertisements on X could be targeted at buyers who purchase Y.
X and Y could be combined into a new product, such as having Y in flavors of X.
While we may know that certain items are frequently bought together, the question
is, how do we uncover these associations?
Besides increasing sales profits, association rules can also be used in other fields.
In medical diagnosis for instance, understanding which symptoms tend to co-
morbid can help to improve patient care and medicine prescription.
Association rules analysis is a technique to uncover how items are associated to
each other. There are three common ways to measure association.
Market Basket Analysis
• Market Basket Analysis is a modelling technique to find frequent itemset.
• It is based on, if you buy a certain group of items, you are more (or less) likely to buy
another group of items.
• For example, if you are in a store and you buy a car then you are more likely to buy
insurance at the same time than somebody who don't buy insurance also.
• The set of items that a customer buys it referred as an itemset.
• Market basket analysis seeks to find relationships between purchases (Items).
• E.g. IF {Car, Accessories} THEN {Insurance}
{Car, Accessories} 🡪 {Insurance}
The probability that a customer will buy
car without an accessories is referred as
the support for rule.
The conditional probability that a
customer will purchase Insurance is
referred to as the confidence.
Association Rule Mining
• Given a set of transactions, we need rules that will predict the
occurrence of an item based on the occurrences of other items in the
transaction.
• Market-Basket transactions
TI
D
Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
Example of Association
Rules
{Chocolate} → {Pepsi},
{Milk, Bread} → {Eggs, Coke},
{Pepsi, Bread} → {Milk}
Association Rule Mining Cont..
Itemset
• A collection of one or more items
o E.g. : {Milk, Bread, Chocolate}
• k-itemset
An itemset that contains k items
Support count (σ)
• Frequency of occurrence of an itemset
o E.g. σ({Milk, Bread, Chocolate}) = 2
Support
• Fraction of transactions that contain an itemset
o E.g. s({Milk, Bread, Chocolate}) = 2/5
Frequent Itemset
• An itemset whose support is greater than or equal to a
minimum support threshold
• Lift(l) –
The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence,
assuming that the itemsets X and Y are independent of each other.The expected confidence is
the confidence divided by the frequency of {Y}.
TI
D
Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
Association Rule Mining Cont..
• Association Rule
• An implication expression of the form X → Y, where X and Y are
item sets
• E.g.: {Milk, Chocolate} → {Pepsi}
• Rule Evaluation
• Support (s)
• Fraction of transactions that contain both X and Y
• Confidence (c)
• Measures how often items in Y appear in transactions that contain X
TI
D
Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
Example:
Find support & confidence for {Milk, Chocolate} ⇒ Pepsi
Association Rule Mining Cont..
TI
D
Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
Calculate Support &
Confidence
Support (s) : 0.4
1. {Milk, Chocolate} → {Pepsi} c = 0.67
2. {Milk, Pepsi} → {Chocolate} c = 1.0
3. {Chocolate, Pepsi} → {Milk} c = 0.67
4. {Pepsi} → {Milk, Chocolate} c = 0.67
5. {Chocolate} → {Milk, Pepsi} c = 0.5
6. {Milk} → {Chocolate, Pepsi} c = 0.5
A common strategy adopted by many association rule mining algorithms is to decompose the problem into
two major subtasks:
1. Frequent Itemset Generation
• The objective is to find all the item-sets that satisfy the minimum support threshold.
• These item sets are called frequent item sets.
2. Rule Generation
• The objective is to extract all the high-confidence rules from the frequent item sets found in the
previous step.
• These rules are called strong rules.
Apriori Algorithm - Example
Scan
D
C
1
L
1
C
2
Scan
D
C
2
L
2
Minimum Support =
2
ItemS
et
Min.
Sup
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
ItemS
et
Min.
Sup
{1} 2
{2} 3
{3} 3
{5} 3
ItemS
et
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
ItemS
et
Min.
Sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
ItemS
et
Min.
Sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
Scan
D
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
ItemS
et
Min.
Sup
{1 2 3} 1
{1 2 5} 1
{1 3 5} 1
{2 3 5} 2
Apriori Algorithm - Example Cont..
Rules Generation
Minimum Support =
2
Association
Rule
Support Confidenc
e
Confidence (%)
2 ^ 3 🡪 5 2 2/2 = 1 100 %
3 ^ 5 🡪 2 2 2/2 = 1 100 %
2 ^ 5 🡪 3 2 2/3 = 0.66 66%
2 = 3 ^ 5 2 2/3 = 0.66 66%
3 = 2 ^ 5 2 2/3 = 0.66 66%
5 = 2 ^ 3 2 2/3 = 0.66 66%
Apriori Algorithm
• Purpose: The Apriori Algorithm is an influential algorithm for mining
frequent itemsets for Boolean association rules.
• Key Concepts:
• Frequent Itemsets:
• The sets of item which has minimum support (denoted by Li for ith-Itemset).
• Apriori Property:
• Any subset of frequent itemset must be frequent.
• Join Operation:
• To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 itself.
Apriori Algorithm Cont..
Find the frequent itemsets
• The sets of items that have minimum support and a subset of a frequent itemset must also be a
frequent itemset (Apriori Property).
• E.g. if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset.
• Use the frequent item sets to generate association rules.
The Apriori Algorithm : Pseudo code
• Join Step: Ck is generated by joining Lk-1with itself
• Prune Step: Any (k-1) itemset that is not frequent cannot be a subset of a frequent k-itemset
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1= {frequent items};
for (k = 1; Lk != ∅; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
Increment the count of all candidates in Ck+1
That are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk;
Apriori Algorithm Steps
Step 1:
• Start with itemsets containing just a single item (Individual items).
Step 2:
• Determine the support for itemsets.
• Keep the itemsets that meet your minimum support threshold and remove
itemsets that do not support minimum support.
Step 3:
• Using the itemsets you have kept from Step 1, generate all the possible itemset
combinations.
Step 4:
• Repeat steps 1 & 2 until there are no more new itemsets.
Apriori Algorithm (Try Yourself!)
A database has 4 transactions. Let Min_sup = 50% and Min_conf =
75%
Sr. Association Rule Support Confidence Confidence (%)
Rule 1 Butter^Milk 🡪 Bread 2 2/2 = 1 100%
Rule 2 Milk^Bread 🡪 Butter 2 2/2 = 1 100%
Rule 3 Butter^Bread 🡪 Milk 2 2/3 = 0.66 66%
Rule 4 Butter🡪Milk^Bread 2 2/3 = 0.66 66%
Rule 5 Milk🡪Butter^Bread 2 2/3 = 0.66 66%
Rule 6 Bread🡪Butter^Milk 2 2/3 = 0.66 66%
TID Items
1000 Cheese, Milk, Cookies
2000 Butter, Milk, Bread
3000 Cheese, Butter, Milk, Bread
4000 Butter, Bread
Frequent Itemset Sup
Butter,Milk,Bread 2
Min_sup = 50%
How to convert support in
integer?
Given % X Total Records
100
So here, 50 X 4
100
=
2
FP-Growth Algorithm
The FP-Growth Algorithm is proposed by Han.
It is an efficient and scalable method for mining the complete set of
frequent patterns.
Using prefix-tree structure for storing information about frequent
patterns named frequent-pattern tree (FP-tree).
Once an FP-tree has been constructed, it uses a recursive divide-and-
conquer approach to mine the frequent item sets.
Arranged
Order
A : 8
C : 8
E : 8
G : 5
B : 2
D : 2
F : 2
Remaining
all
O,I,J,L,P,M
& N is with
min_sup =
1
FP-Growth Algorithm - Example
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
Step:1
Freq. 1-
Itemsets.
Min_Sup ≥ 2
Step:2
Transactions with items sorted
based on frequencies, and ignoring
the infrequent items.
FP-Tree Generation
Building the FP-Tree
✔ Scan data to determine the
support count of each item.
✔ Infrequent items are
discarded, while the frequent
items are sorted in
decreasing support counts.
✔ Make a second pass over the
data to construct the FP-tree.
✔ As the transactions are read,
before being processed, their
items are sorted according to
the above order.
TID Items
1 A B C E F
O
2 A C G
3 E I
4 A C D E G
5 A C E G L
6 E J
7 A B C E F
P
8 A C D
9 A C E G M
10 A C E G N
Minimum Support =
2
FP-Tree after reading 1st transaction
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
nul
l
A:1
C:1
E:1
B:1
F:1
Header
Prof. Naimish R Vadodariya 24
#3160714 (DM) ⬥ Unit 3 – Concept Description, Mining Frequent Patterns, Associations & Correlations
FP-Tree after reading 2nd transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:2
C:2
E:1
B:1
F:1
Header
FP-Tree after reading 3rd transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:2
C:2
E:1
B:1
F:1
Header
E:1
FP-Tree after reading 4th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:3
C:3
E:2
B:1
F:1
Header
E:1
G:1
D:1
FP-Tree after reading 5th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:4
C:4
E:3
B:1
F:1
Header
E:1
G:2
D:1
FP-Tree after reading 6th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:4
C:4
E:3
B:1
F:1
Header
E:2
G:2
D:1
FP-Tree after reading 7th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:5
C:5
E:4
B:2
F:2
Header
E:2
G:2
D:1
FP-Tree after reading 8th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:6
C:6
E:4
B:2
F:2
Header
E:2
G:2
D:1
D:1
FP-Tree after reading 9th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:7
C:7
E:5
B:2
F:2
Header
E:2
G:3
D:1
D:1
FP-Tree after reading 10th transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
nul
l
A:8
C:8
E:6
B:2
F:2
Header
E:2
G:4
D:1
D:1
Apriori Algorithm – Example
Min-support = 60% and Confidence
= 80%
TID Items
100 1,3,4,6
200 2,3,5,7
300 1,2,3,5,8
400 2,5,9,10
500 1,4
Itemse
t
Support
2,5 3
Rule Support Confidence Conf. (%)
2 🡪 5 3 3/3 = 1 100%
5 🡪 2 3 3/3 = 1 100%
TID Items
1000 A,B,C
2000 A,C
3000 A,D
5000 B,E,F
Itemse
t
Support
A,C 2
Rule Support Confidence Conf. (%)
A 🡪 C 2 2/3 = 0.66 66%
C 🡪 A 2 2/2 = 1 100%
Min-support =
50%
FP-Growth Algorithm – Example
B:
8
A:
5
nul
l
C:
3
D:
1
A:
2
C:
1
D:
1
E:1
D:
1
E:1
C:
3
D:
1
D:
1
E:1
Header
FP-Tree
Minimum Support =
3
TID Items
1 A B
2 B C D
3 A C D E
4 A D E
5 A B C
6 A B C D
7 B C
8 A B C
9 A B D
10 B C E
Item Support
B 8
A 7
C 7
D 5
E 3

More Related Content

What's hot

Association rule mining
Association rule miningAssociation rule mining
Association rule miningUtkarsh Sharma
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsPier Luca Lanzi
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationKnoldus Inc.
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine LearningAnkit Rai
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisMahendra Gupta
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 

What's hot (20)

Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Association rules
Association rulesAssociation rules
Association rules
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Decision tree
Decision treeDecision tree
Decision tree
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Clustering
ClusteringClustering
Clustering
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 

Similar to MODULE 5 _ Mining frequent patterns and associations.pptx

Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_RulesFEG
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdfWailaBaba
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdfJyoti Yadav
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath YogiTekendra Nath Yogi
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesRashmi Bhat
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptxAmenahAbbood
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...Smarten Augmented Analytics
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit IIImalathieswaran29
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule MiningPALLAB DAS
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisSandeep Prasad
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptraju980973
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Conceptsdataminers.ir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 

Similar to MODULE 5 _ Mining frequent patterns and associations.pptx (20)

apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 

More from nikshaikh786

Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxnikshaikh786
 
Module 1_ Introduction.pptx
Module 1_ Introduction.pptxModule 1_ Introduction.pptx
Module 1_ Introduction.pptxnikshaikh786
 
Module 1_ Introduction to Mobile Computing.pptx
Module 1_  Introduction to Mobile Computing.pptxModule 1_  Introduction to Mobile Computing.pptx
Module 1_ Introduction to Mobile Computing.pptxnikshaikh786
 
Module 2_ GSM Mobile services.pptx
Module 2_  GSM Mobile services.pptxModule 2_  GSM Mobile services.pptx
Module 2_ GSM Mobile services.pptxnikshaikh786
 
Module 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptxModule 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptxnikshaikh786
 
Module 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxModule 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxnikshaikh786
 
MODULE 5- EDA.pptx
MODULE 5- EDA.pptxMODULE 5- EDA.pptx
MODULE 5- EDA.pptxnikshaikh786
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxnikshaikh786
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptxnikshaikh786
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptxnikshaikh786
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxnikshaikh786
 
MAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdfMAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdfnikshaikh786
 
VIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdfVIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdfnikshaikh786
 
Mad&pwa practical no. 1
Mad&pwa practical no. 1Mad&pwa practical no. 1
Mad&pwa practical no. 1nikshaikh786
 

More from nikshaikh786 (20)

Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptx
 
Module 1_ Introduction.pptx
Module 1_ Introduction.pptxModule 1_ Introduction.pptx
Module 1_ Introduction.pptx
 
Module 1_ Introduction to Mobile Computing.pptx
Module 1_  Introduction to Mobile Computing.pptxModule 1_  Introduction to Mobile Computing.pptx
Module 1_ Introduction to Mobile Computing.pptx
 
Module 2_ GSM Mobile services.pptx
Module 2_  GSM Mobile services.pptxModule 2_  GSM Mobile services.pptx
Module 2_ GSM Mobile services.pptx
 
TCS MODULE 6.pdf
TCS MODULE 6.pdfTCS MODULE 6.pdf
TCS MODULE 6.pdf
 
Module 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptxModule 2_ Cyber offenses & Cybercrime.pptx
Module 2_ Cyber offenses & Cybercrime.pptx
 
Module 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxModule 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptx
 
MODULE 5- EDA.pptx
MODULE 5- EDA.pptxMODULE 5- EDA.pptx
MODULE 5- EDA.pptx
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptx
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptx
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 
MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
 
IOE MODULE 6.pptx
IOE MODULE 6.pptxIOE MODULE 6.pptx
IOE MODULE 6.pptx
 
MAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdfMAD&PWA VIVA QUESTIONS.pdf
MAD&PWA VIVA QUESTIONS.pdf
 
VIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdfVIVA QUESTIONS FOR DEVOPS.pdf
VIVA QUESTIONS FOR DEVOPS.pdf
 
IOE MODULE 5.pptx
IOE MODULE 5.pptxIOE MODULE 5.pptx
IOE MODULE 5.pptx
 
Mad&pwa practical no. 1
Mad&pwa practical no. 1Mad&pwa practical no. 1
Mad&pwa practical no. 1
 
Ioe module 4
Ioe module 4Ioe module 4
Ioe module 4
 
Ioe module 3
Ioe module 3Ioe module 3
Ioe module 3
 
Ioe module 2
Ioe module 2Ioe module 2
Ioe module 2
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 

Recently uploaded (20)

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 

MODULE 5 _ Mining frequent patterns and associations.pptx

  • 1. Mining frequent patterns and associations
  • 2. What is Concept Description? • As we know that data mining functionalities can be classified into two categories: 1. Descriptive 2. Predictive ▪ Descriptive • This task presents the general properties of data stored in a database. • The descriptive tasks are used to find out patterns in data. • E.g.: Cluster, Trends, etc. ▪ Predictive • These tasks predict the value of one attribute on the basis of values of other attributes. • E.g.: Festival Customer/Product Sell prediction at store
  • 3. What is Concept Description? Cont.. • Descriptive data mining describes the data set in a concise and summative manner and presents interesting general properties of the data. • Predictive data mining analyzes the data in order to construct one or a set of models, and attempts to predict the behavior of new data sets. • Database is usually storing the large amounts of data in great detail. However users often like to view sets of summarized data in concise, descriptive terms. • A concept usually refers to a collection of concise or summarized data such as grade wise students, products selling on festival etc.
  • 4. What is Concept Description? Cont.. • Concept description generates descriptions for characterization and comparison of the data, it is also called class description. • Characterization provides a concise and brief summarization of the data. • While concept or class comparison (also known as discrimination) provides discriminations (inequity) comparing two or more collections of data. • Example • Given the ABC Company database, for example, examining individual customer transactions. • Sales managers may prefer to view the data generalized to higher levels, such as summarized by customer groups according to geographic regions, frequency of purchases per group and customer income.
  • 5. The Problem When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one’s needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways. If there is a pair of items, X and Y, that are frequently bought together: Both X and Y can be placed on the same shelf, so that buyers of one item would be prompted to buy the other. Promotional discounts could be applied to just one out of the two items. Advertisements on X could be targeted at buyers who purchase Y. X and Y could be combined into a new product, such as having Y in flavors of X.
  • 6. While we may know that certain items are frequently bought together, the question is, how do we uncover these associations? Besides increasing sales profits, association rules can also be used in other fields. In medical diagnosis for instance, understanding which symptoms tend to co- morbid can help to improve patient care and medicine prescription. Association rules analysis is a technique to uncover how items are associated to each other. There are three common ways to measure association.
  • 7.
  • 8.
  • 9.
  • 10. Market Basket Analysis • Market Basket Analysis is a modelling technique to find frequent itemset. • It is based on, if you buy a certain group of items, you are more (or less) likely to buy another group of items. • For example, if you are in a store and you buy a car then you are more likely to buy insurance at the same time than somebody who don't buy insurance also. • The set of items that a customer buys it referred as an itemset. • Market basket analysis seeks to find relationships between purchases (Items). • E.g. IF {Car, Accessories} THEN {Insurance} {Car, Accessories} 🡪 {Insurance} The probability that a customer will buy car without an accessories is referred as the support for rule. The conditional probability that a customer will purchase Insurance is referred to as the confidence.
  • 11. Association Rule Mining • Given a set of transactions, we need rules that will predict the occurrence of an item based on the occurrences of other items in the transaction. • Market-Basket transactions TI D Items 1 Bread, Milk 2 Bread, Chocolate, Pepsi, Eggs 3 Milk, Chocolate, Pepsi, Coke 4 Bread, Milk, Chocolate, Pepsi 5 Bread, Milk, Chocolate, Coke Example of Association Rules {Chocolate} → {Pepsi}, {Milk, Bread} → {Eggs, Coke}, {Pepsi, Bread} → {Milk}
  • 12. Association Rule Mining Cont.. Itemset • A collection of one or more items o E.g. : {Milk, Bread, Chocolate} • k-itemset An itemset that contains k items Support count (σ) • Frequency of occurrence of an itemset o E.g. σ({Milk, Bread, Chocolate}) = 2 Support • Fraction of transactions that contain an itemset o E.g. s({Milk, Bread, Chocolate}) = 2/5 Frequent Itemset • An itemset whose support is greater than or equal to a minimum support threshold • Lift(l) – The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence, assuming that the itemsets X and Y are independent of each other.The expected confidence is the confidence divided by the frequency of {Y}. TI D Items 1 Bread, Milk 2 Bread, Chocolate, Pepsi, Eggs 3 Milk, Chocolate, Pepsi, Coke 4 Bread, Milk, Chocolate, Pepsi 5 Bread, Milk, Chocolate, Coke
  • 13. Association Rule Mining Cont.. • Association Rule • An implication expression of the form X → Y, where X and Y are item sets • E.g.: {Milk, Chocolate} → {Pepsi} • Rule Evaluation • Support (s) • Fraction of transactions that contain both X and Y • Confidence (c) • Measures how often items in Y appear in transactions that contain X TI D Items 1 Bread, Milk 2 Bread, Chocolate, Pepsi, Eggs 3 Milk, Chocolate, Pepsi, Coke 4 Bread, Milk, Chocolate, Pepsi 5 Bread, Milk, Chocolate, Coke Example: Find support & confidence for {Milk, Chocolate} ⇒ Pepsi
  • 14. Association Rule Mining Cont.. TI D Items 1 Bread, Milk 2 Bread, Chocolate, Pepsi, Eggs 3 Milk, Chocolate, Pepsi, Coke 4 Bread, Milk, Chocolate, Pepsi 5 Bread, Milk, Chocolate, Coke Calculate Support & Confidence Support (s) : 0.4 1. {Milk, Chocolate} → {Pepsi} c = 0.67 2. {Milk, Pepsi} → {Chocolate} c = 1.0 3. {Chocolate, Pepsi} → {Milk} c = 0.67 4. {Pepsi} → {Milk, Chocolate} c = 0.67 5. {Chocolate} → {Milk, Pepsi} c = 0.5 6. {Milk} → {Chocolate, Pepsi} c = 0.5 A common strategy adopted by many association rule mining algorithms is to decompose the problem into two major subtasks: 1. Frequent Itemset Generation • The objective is to find all the item-sets that satisfy the minimum support threshold. • These item sets are called frequent item sets. 2. Rule Generation • The objective is to extract all the high-confidence rules from the frequent item sets found in the previous step. • These rules are called strong rules.
  • 15. Apriori Algorithm - Example Scan D C 1 L 1 C 2 Scan D C 2 L 2 Minimum Support = 2 ItemS et Min. Sup {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 ItemS et Min. Sup {1} 2 {2} 3 {3} 3 {5} 3 ItemS et {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} ItemS et Min. Sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 ItemS et Min. Sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 Scan D TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 ItemS et Min. Sup {1 2 3} 1 {1 2 5} 1 {1 3 5} 1 {2 3 5} 2
  • 16. Apriori Algorithm - Example Cont.. Rules Generation Minimum Support = 2 Association Rule Support Confidenc e Confidence (%) 2 ^ 3 🡪 5 2 2/2 = 1 100 % 3 ^ 5 🡪 2 2 2/2 = 1 100 % 2 ^ 5 🡪 3 2 2/3 = 0.66 66% 2 = 3 ^ 5 2 2/3 = 0.66 66% 3 = 2 ^ 5 2 2/3 = 0.66 66% 5 = 2 ^ 3 2 2/3 = 0.66 66%
  • 17. Apriori Algorithm • Purpose: The Apriori Algorithm is an influential algorithm for mining frequent itemsets for Boolean association rules. • Key Concepts: • Frequent Itemsets: • The sets of item which has minimum support (denoted by Li for ith-Itemset). • Apriori Property: • Any subset of frequent itemset must be frequent. • Join Operation: • To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 itself.
  • 18. Apriori Algorithm Cont.. Find the frequent itemsets • The sets of items that have minimum support and a subset of a frequent itemset must also be a frequent itemset (Apriori Property). • E.g. if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset. • Use the frequent item sets to generate association rules. The Apriori Algorithm : Pseudo code • Join Step: Ck is generated by joining Lk-1with itself • Prune Step: Any (k-1) itemset that is not frequent cannot be a subset of a frequent k-itemset Ck: Candidate itemset of size k Lk: Frequent itemset of size k L1= {frequent items}; for (k = 1; Lk != ∅; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do Increment the count of all candidates in Ck+1 That are contained in t Lk+1 = candidates in Ck+1 with min_support end return ∪k Lk;
  • 19. Apriori Algorithm Steps Step 1: • Start with itemsets containing just a single item (Individual items). Step 2: • Determine the support for itemsets. • Keep the itemsets that meet your minimum support threshold and remove itemsets that do not support minimum support. Step 3: • Using the itemsets you have kept from Step 1, generate all the possible itemset combinations. Step 4: • Repeat steps 1 & 2 until there are no more new itemsets.
  • 20. Apriori Algorithm (Try Yourself!) A database has 4 transactions. Let Min_sup = 50% and Min_conf = 75% Sr. Association Rule Support Confidence Confidence (%) Rule 1 Butter^Milk 🡪 Bread 2 2/2 = 1 100% Rule 2 Milk^Bread 🡪 Butter 2 2/2 = 1 100% Rule 3 Butter^Bread 🡪 Milk 2 2/3 = 0.66 66% Rule 4 Butter🡪Milk^Bread 2 2/3 = 0.66 66% Rule 5 Milk🡪Butter^Bread 2 2/3 = 0.66 66% Rule 6 Bread🡪Butter^Milk 2 2/3 = 0.66 66% TID Items 1000 Cheese, Milk, Cookies 2000 Butter, Milk, Bread 3000 Cheese, Butter, Milk, Bread 4000 Butter, Bread Frequent Itemset Sup Butter,Milk,Bread 2 Min_sup = 50% How to convert support in integer? Given % X Total Records 100 So here, 50 X 4 100 = 2
  • 21. FP-Growth Algorithm The FP-Growth Algorithm is proposed by Han. It is an efficient and scalable method for mining the complete set of frequent patterns. Using prefix-tree structure for storing information about frequent patterns named frequent-pattern tree (FP-tree). Once an FP-tree has been constructed, it uses a recursive divide-and- conquer approach to mine the frequent item sets.
  • 22. Arranged Order A : 8 C : 8 E : 8 G : 5 B : 2 D : 2 F : 2 Remaining all O,I,J,L,P,M & N is with min_sup = 1 FP-Growth Algorithm - Example A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G Step:1 Freq. 1- Itemsets. Min_Sup ≥ 2 Step:2 Transactions with items sorted based on frequencies, and ignoring the infrequent items. FP-Tree Generation Building the FP-Tree ✔ Scan data to determine the support count of each item. ✔ Infrequent items are discarded, while the frequent items are sorted in decreasing support counts. ✔ Make a second pass over the data to construct the FP-tree. ✔ As the transactions are read, before being processed, their items are sorted according to the above order. TID Items 1 A B C E F O 2 A C G 3 E I 4 A C D E G 5 A C E G L 6 E J 7 A B C E F P 8 A C D 9 A C E G M 10 A C E G N Minimum Support = 2
  • 23. FP-Tree after reading 1st transaction A:8 C:8 E:8 G:5 B:2 D:2 F:2 A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G nul l A:1 C:1 E:1 B:1 F:1 Header
  • 24. Prof. Naimish R Vadodariya 24 #3160714 (DM) ⬥ Unit 3 – Concept Description, Mining Frequent Patterns, Associations & Correlations FP-Tree after reading 2nd transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:2 C:2 E:1 B:1 F:1 Header
  • 25. FP-Tree after reading 3rd transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:2 C:2 E:1 B:1 F:1 Header E:1
  • 26. FP-Tree after reading 4th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:3 C:3 E:2 B:1 F:1 Header E:1 G:1 D:1
  • 27. FP-Tree after reading 5th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:4 C:4 E:3 B:1 F:1 Header E:1 G:2 D:1
  • 28. FP-Tree after reading 6th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:4 C:4 E:3 B:1 F:1 Header E:2 G:2 D:1
  • 29. FP-Tree after reading 7th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:5 C:5 E:4 B:2 F:2 Header E:2 G:2 D:1
  • 30. FP-Tree after reading 8th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:6 C:6 E:4 B:2 F:2 Header E:2 G:2 D:1 D:1
  • 31. FP-Tree after reading 9th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:7 C:7 E:5 B:2 F:2 Header E:2 G:3 D:1 D:1
  • 32. FP-Tree after reading 10th transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 nul l A:8 C:8 E:6 B:2 F:2 Header E:2 G:4 D:1 D:1
  • 33. Apriori Algorithm – Example Min-support = 60% and Confidence = 80% TID Items 100 1,3,4,6 200 2,3,5,7 300 1,2,3,5,8 400 2,5,9,10 500 1,4 Itemse t Support 2,5 3 Rule Support Confidence Conf. (%) 2 🡪 5 3 3/3 = 1 100% 5 🡪 2 3 3/3 = 1 100% TID Items 1000 A,B,C 2000 A,C 3000 A,D 5000 B,E,F Itemse t Support A,C 2 Rule Support Confidence Conf. (%) A 🡪 C 2 2/3 = 0.66 66% C 🡪 A 2 2/2 = 1 100% Min-support = 50%
  • 34. FP-Growth Algorithm – Example B: 8 A: 5 nul l C: 3 D: 1 A: 2 C: 1 D: 1 E:1 D: 1 E:1 C: 3 D: 1 D: 1 E:1 Header FP-Tree Minimum Support = 3 TID Items 1 A B 2 B C D 3 A C D E 4 A D E 5 A B C 6 A B C D 7 B C 8 A B C 9 A B D 10 B C E Item Support B 8 A 7 C 7 D 5 E 3