SlideShare a Scribd company logo
Association Rules and
Frequent Pattern
Analysis
Dr. Iqbal H. Sarker
Dept of CSE, CUET
Research LAB Web:
Sarker DataLAB
(http://sarkerdatalab.com/)
Machine Learning Slide 1
Iqbal H. Sarker
Today’s Agenda
 Introduction to Association Rules
 Motivation with Examples
 Algorithms
 How it works?
 Real life Application Areas
 Summary
Slide 2
Iqbal H. Sarker Machine Learning
Introduction to AR
 Ideas come from the market basket analysis (MBA)
◼ Let’s go shopping!
Milk, eggs, sugar,
bread
Eggs, sugar
Milk, eggs, cereal,
bread
Customer1
Customer2 Customer3
◼ What do my customer buy? Which product are bought together?
◼ Aim: Find associations and correlations between the different
items that customers place in their shopping basket
Slide 3
Iqbal H. Sarker Machine Learning
Association rule learning is
a rule-based machine
learning method for discovering
interesting relations between
variables in large databases.
Iqbal H. Sarker Machine Learning Slide 4
Real-Life Applications
Used in many recommender systems
5
Machine Learning
Iqbal H. Sarker
Introduction to AR
 Formalizing the problem a little bit
◼ Transaction Database T: a set of transactions T = {t1, t2, …, tn}
◼ Each transaction contains a set of items I (item set)
◼ An itemset is a collection of items I = {i1, i2, …, im}
 General aim:
◼ Find frequent/interesting patterns, associations, correlations, or
causal structures among sets of items or elements in
databases or other information repositories.
◼ Put this relationships in terms of association rules
➢ X  Y
Slide 6
Iqbal H. Sarker Machine Learning
What’s an Interesting Rule?
 An association rule is an TID Items
implication of two itemsets
◼ X  Y
T1
T2
T3
T4
T5
bread, jelly, peanut-butter
bread, peanut-butter
bread, milk, peanut-butter
beer, bread
beer, milk
 Many measures of interest.
The two most used are:
◼ Support (s)
➢ The occurring frequency of the rule,
i.e., number of transactions that
contain both X and Y
s =
(X Y)
No.of trans.
◼ Confidence (c)
➢ The strength of the association,
i.e., measures of how often items in Y
Slide 7
Iqbal H. Sarker Machine Learning
appear in transactions that contain X
c =
(X  Y )
(X)
8
Mining Association Rules—an Example
For rule A  C:
support = support({A}{C}) = 50%
confidence = support({A}{C})/support({A}) = 66.6%
Min. support 50%
Min. confidence 50%
Transaction-id Items bought
10 A, B, C
20 A, C
30 A, D
40 B, E, F
Frequent pattern Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
Machine Learning
Iqbal H. Sarker
The Apriori Algorithm: Basics
 The name, Apriori, is based on the fact that the algorithm
uses prior knowledge of frequent itemset properties
 It consists of two steps
1. Generate all frequent itemsets whose support ≥
minsup
2. Use frequent itemsets to generate association rules
 So, let’s pay attention to the first step
Slide 9
Iqbal H. Sarker Machine Learning
Apriori
null
A B C D E
AB AD
AC AE BD
BC BE CE
CD DE
ABC ABE
ABD ACD ADE
ACE BCD BDE
BCE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given n items, we have 2^n possible itemsets.
◼ Do we have to generate them all?
Slide 10
Iqbal H. Sarker Machine Learning
Apriori
 Let’s avoid expanding all the graph
 Key idea:
◼ Use Apriori Property: Any subsets of a frequent itemset are
also frequent itemsets
 Therefore, the algorithm iteratively does:
◼ Create itemsets
◼ Only continue exploration of those whose support ≥ minsup
Slide 11
Iqbal H. Sarker Machine Learning
Apriori: Pseudo-code
Iqbal H. Sarker Machine Learning Slide 12
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a
frequent k-itemset
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Illustration of the Apriori
principle
Found to be
Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Pruned
Infrequent supersets
Another Example
null
Infrequent
itemset
A B C D E
AB AD
AC AE BD
BC BE CE
CD DE
ABC ABE
ABD ACD ADE
ACE BCD BDE
BCE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Slide 14
Iqbal H. Sarker Machine Learning
Example of Apriori Run
Database TDB
L1
C1
1st scan
C2 C2
L2 2nd scan
L3
C3 3rd scan
Slide 20
Machine Learning
Itemset sup
{B, C, E} 2
Itemset
{B, C, E}
Itemset sup
{A, C} 2
{B, C} 2
{B, E} 3
{C, E} 2
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 1
{A, C} 2
{A, E} 1
{B, C} 2
{B, E} 3
{C, E} 2
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Machine Learning Slide 15
Apriori
 Remember that Apriori consists of two steps
1. Generate all frequent itemsets whose support ≥ minsup
2. Use frequent itemsets to generate association rules
 We accomplished step 1. So we have all frequent
itemsets
 So, let’s pay attention to the second step
Slide 16
Iqbal H. Sarker Machine Learning
Rule Generation in Apriori
 Given a frequent itemset L
◼ Find all non-empty subsets F in L, such that the association
rule F  {L-F} satisfies the minimum confidence
◼ Create the rule F  {L-F}
 If L={A,B,C}
◼ The candidate itemsets are: ABC, ACB, BCA, ABC,
BAC, CAB
◼ In general, there are 2K-2 candidate solutions, where k is the
length of the itemset L
Slide 17
Iqbal H. Sarker Machine Learning
Example of Efficient Rule Generation
ABCD
Low
confidence
ABCD ABDC ACDB BCDA
ABCD ACBD BCAD BDAD
ADBC CDAB
ABCD BACD CABD DABC
Slide 18
Iqbal H. Sarker Machine Learning
Pruned
Rules
Relevant Algorithms
1. Apriori
2. FP-Growth
3. ECLAT
4. ABC-RuleMiner (Sarker et al., Elsevier)
[ABC-RuleMiner: User behavioral rule-based machine learning method for
context-aware intelligent services, Journal of Network and Computer
Applications, Elsevier, 2020]
5. Others…
Iqbal H. Sarker Machine Learning Slide 19
Possible Application Areas
➢Market Basket Analysis
➢Context-Aware Intelligent Systems
➢Medical Diagnosis
➢Mobile Applications
➢Smart Cities
➢Cyber security
➢Protein Sequence
➢Web Usage
➢Census Data
➢So on..
Iqbal H. Sarker Machine Learning Slide 20
Questions ?
Thank You !!!
Sarker DataLAB
(http://sarkerdatalab.com/)
Email: iqbal.sarker.cse@gmail.com
21

More Related Content

Similar to Association-Analysis.pdf

CS583-association-rules presentation.ppt
CS583-association-rules presentation.pptCS583-association-rules presentation.ppt
CS583-association-rules presentation.ppt
l228296
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
ZAFmedia
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
logeswarisaravanan
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
vayumani25
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
NBACriteria2SICET
 
7 algorithm
7 algorithm7 algorithm
7 algorithm
Vishal Dutt
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
Subrata Kumer Paul
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
sameeksha15
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
Gautam Thakur
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
Shani729
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
SherishJaved
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
ijdpsjournal
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
dataminers.ir
 

Similar to Association-Analysis.pdf (20)

CS583-association-rules presentation.ppt
CS583-association-rules presentation.pptCS583-association-rules presentation.ppt
CS583-association-rules presentation.ppt
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
 
7 algorithm
7 algorithm7 algorithm
7 algorithm
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
 
6asso
6asso6asso
6asso
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
 
B0950814
B0950814B0950814
B0950814
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

Association-Analysis.pdf

  • 1. Association Rules and Frequent Pattern Analysis Dr. Iqbal H. Sarker Dept of CSE, CUET Research LAB Web: Sarker DataLAB (http://sarkerdatalab.com/) Machine Learning Slide 1 Iqbal H. Sarker
  • 2. Today’s Agenda  Introduction to Association Rules  Motivation with Examples  Algorithms  How it works?  Real life Application Areas  Summary Slide 2 Iqbal H. Sarker Machine Learning
  • 3. Introduction to AR  Ideas come from the market basket analysis (MBA) ◼ Let’s go shopping! Milk, eggs, sugar, bread Eggs, sugar Milk, eggs, cereal, bread Customer1 Customer2 Customer3 ◼ What do my customer buy? Which product are bought together? ◼ Aim: Find associations and correlations between the different items that customers place in their shopping basket Slide 3 Iqbal H. Sarker Machine Learning
  • 4. Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. Iqbal H. Sarker Machine Learning Slide 4
  • 5. Real-Life Applications Used in many recommender systems 5 Machine Learning Iqbal H. Sarker
  • 6. Introduction to AR  Formalizing the problem a little bit ◼ Transaction Database T: a set of transactions T = {t1, t2, …, tn} ◼ Each transaction contains a set of items I (item set) ◼ An itemset is a collection of items I = {i1, i2, …, im}  General aim: ◼ Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements in databases or other information repositories. ◼ Put this relationships in terms of association rules ➢ X  Y Slide 6 Iqbal H. Sarker Machine Learning
  • 7. What’s an Interesting Rule?  An association rule is an TID Items implication of two itemsets ◼ X  Y T1 T2 T3 T4 T5 bread, jelly, peanut-butter bread, peanut-butter bread, milk, peanut-butter beer, bread beer, milk  Many measures of interest. The two most used are: ◼ Support (s) ➢ The occurring frequency of the rule, i.e., number of transactions that contain both X and Y s = (X Y) No.of trans. ◼ Confidence (c) ➢ The strength of the association, i.e., measures of how often items in Y Slide 7 Iqbal H. Sarker Machine Learning appear in transactions that contain X c = (X  Y ) (X)
  • 8. 8 Mining Association Rules—an Example For rule A  C: support = support({A}{C}) = 50% confidence = support({A}{C})/support({A}) = 66.6% Min. support 50% Min. confidence 50% Transaction-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Frequent pattern Support {A} 75% {B} 50% {C} 50% {A, C} 50% Machine Learning Iqbal H. Sarker
  • 9. The Apriori Algorithm: Basics  The name, Apriori, is based on the fact that the algorithm uses prior knowledge of frequent itemset properties  It consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules  So, let’s pay attention to the first step Slide 9 Iqbal H. Sarker Machine Learning
  • 10. Apriori null A B C D E AB AD AC AE BD BC BE CE CD DE ABC ABE ABD ACD ADE ACE BCD BDE BCE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given n items, we have 2^n possible itemsets. ◼ Do we have to generate them all? Slide 10 Iqbal H. Sarker Machine Learning
  • 11. Apriori  Let’s avoid expanding all the graph  Key idea: ◼ Use Apriori Property: Any subsets of a frequent itemset are also frequent itemsets  Therefore, the algorithm iteratively does: ◼ Create itemsets ◼ Only continue exploration of those whose support ≥ minsup Slide 11 Iqbal H. Sarker Machine Learning
  • 12. Apriori: Pseudo-code Iqbal H. Sarker Machine Learning Slide 12 Join Step: Ck is generated by joining Lk-1with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;
  • 13. Illustration of the Apriori principle Found to be Infrequent null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Pruned Infrequent supersets
  • 14. Another Example null Infrequent itemset A B C D E AB AD AC AE BD BC BE CE CD DE ABC ABE ABD ACD ADE ACE BCD BDE BCE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Slide 14 Iqbal H. Sarker Machine Learning
  • 15. Example of Apriori Run Database TDB L1 C1 1st scan C2 C2 L2 2nd scan L3 C3 3rd scan Slide 20 Machine Learning Itemset sup {B, C, E} 2 Itemset {B, C, E} Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Machine Learning Slide 15
  • 16. Apriori  Remember that Apriori consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules  We accomplished step 1. So we have all frequent itemsets  So, let’s pay attention to the second step Slide 16 Iqbal H. Sarker Machine Learning
  • 17. Rule Generation in Apriori  Given a frequent itemset L ◼ Find all non-empty subsets F in L, such that the association rule F  {L-F} satisfies the minimum confidence ◼ Create the rule F  {L-F}  If L={A,B,C} ◼ The candidate itemsets are: ABC, ACB, BCA, ABC, BAC, CAB ◼ In general, there are 2K-2 candidate solutions, where k is the length of the itemset L Slide 17 Iqbal H. Sarker Machine Learning
  • 18. Example of Efficient Rule Generation ABCD Low confidence ABCD ABDC ACDB BCDA ABCD ACBD BCAD BDAD ADBC CDAB ABCD BACD CABD DABC Slide 18 Iqbal H. Sarker Machine Learning Pruned Rules
  • 19. Relevant Algorithms 1. Apriori 2. FP-Growth 3. ECLAT 4. ABC-RuleMiner (Sarker et al., Elsevier) [ABC-RuleMiner: User behavioral rule-based machine learning method for context-aware intelligent services, Journal of Network and Computer Applications, Elsevier, 2020] 5. Others… Iqbal H. Sarker Machine Learning Slide 19
  • 20. Possible Application Areas ➢Market Basket Analysis ➢Context-Aware Intelligent Systems ➢Medical Diagnosis ➢Mobile Applications ➢Smart Cities ➢Cyber security ➢Protein Sequence ➢Web Usage ➢Census Data ➢So on.. Iqbal H. Sarker Machine Learning Slide 20
  • 21. Questions ? Thank You !!! Sarker DataLAB (http://sarkerdatalab.com/) Email: iqbal.sarker.cse@gmail.com 21