SlideShare a Scribd company logo
Outline
1. Data Mining (DM) ~ KDD [Definition]
2. DM Technique
-> Association rules [support & confidence]
3. Example
(4. Apriori Algorithm)
1. Data Mining ~ KDD [Definition]
- "Data mining (DM), also called KnowledgeDiscovery in Databases (KDD), is the process
of automatically searching large volumes of
data for patterns using specific DM
technique."

- [more formal definition] KDD ~ "the non-trivial
extraction of implicit, previously unknown
and potentially useful knowledge from data"
1. Data Mining ~ KDD [Definition]
Data Mining techniques
•
•
•
•
•
•

Information Visualization
k-nearest neighbor
decision trees
neural networks
association rules
…
2. Association rules
Support
Every association rule has a support and a confidence.
“The support is the percentage of transactions that demonstrate the rule.”

Example: Database with transactions ( customer_# : item_a1, item_a2,
…)
1:
2:
3:
4:

1, 3, 5.
1, 8, 14, 17, 12.
4, 6, 8, 12, 9, 104.
2, 1, 8.

support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
support {1} = 3 (,or 75% ~ 3 of 4 customers)
2. Association rules
Support

An itemset is called frequent if its support is equal or
greater than an agreed upon minimal value – the support
threshold
add to previous example:
if threshold 50%
then itemsets {8,12} and {1} called frequent
2. Association rules
Confidence
Every association rule has a support and a confidence.
An association rule is of the form:

X => Y

• X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X
present in a transition , Y will also be present.
Confidence measure, by definition:
Confidence(X=>Y) equals support(X,Y) / support(X)
2. Association rules
Confidence

We should only consider rules derived from
itemsets with high support, and that also have
high confidence.
“A rule with low confidence is not meaningful.”
Rules don’t explain anything, they just point out
hard facts in data volumes.
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {5} => {8} ) ?
supp({5}) = 5
, supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5
, supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.
Conf ( {8} => {5} ) ? 57% Done.
Rule ( {5} => {8} ) more meaningful then
Rule ( {8} => {5} )
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {9} => {3} ) ?
supp({9}) = 1
, supp({3}) = 1 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.
Notice: High Confidence, Low Support.
-> Rule ( {9} => {3} ) not meaningful
Apriori Algorithm
• In computer science and data mining, Apriori is
a classic algorithm for learning association rules.
• Apriori is designed to operate on databases
containing transactions (for example, collections
of items bought by customers, or details of a
website frequentation).
• The algorithm attempts to find subsets which are
common to at least a minimum number C (the
cutoff, or confidence threshold) of the itemsets.

13
Definition (contd.)
• Apriori uses a "bottom up" approach, where
frequent subsets are extended one item at a
time (a step known as candidate generation, and
groups of candidates are tested against the
data.
• The algorithm terminates when no further
successful extensions are found.
• Apriori uses breadth-first search and a hash
tree structure to count candidate item sets
efficiently.
14
15
Steps to Perform Apriori
Algorithm

16
Apriori Algorithm Examples
Problem Decomposition
Transaction ID Items Bought
1
Shoes, Shirt, Jacket
2
Shoes,Jacket
3
Shoes, Jeans
4
Shirt, Sweatshirt

If the minimum support is 50%, then {Shoes, Jacket} is the only 2itemset that satisfies the minimum support.
Frequent Itemset
{Shoes}
{Shirt}
{Jacket}
{Shoes, Jacket}

Support
75%
50%
50%
50%

If the minimum confidence is 50%, then the only two rules generated from this 2itemset, that have confidence greater than 50%, are:
Shoes ⇒ Jacket Support=50%, Confidence=66%
Jacket ⇒ Shoes Support=50%, Confidence=100%

17
The Apriori Algorithm — Example
Min support =50%

Database D
TID
100
200
300
400

itemset sup.
C1
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3

Items
134
235
1235
25

L2 itemset sup

C2 itemset sup

2
2
3
2

{1
{1
{1
{2
{2
{3

C3 itemset
{2 3 5}

Scan D

{1 3}
{2 3}
{2 5}
{3 5}

2}
3}
5}
3}
5}
5}

1
2
1
2
3
2

L1 itemset sup.
{1}
{2}
{3}
{5}

2
3
3
3

C2 itemset
{1 2}
Scan D

L3 itemset sup
{2 3 5} 2

{1
{1
{2
{2
{3

3}
5}
3}
5}
5}

18
Pseudo Code for Apriori
Algorithm

19
Apriori
Advantages/Disadvantages
• Advantages
– Uses large itemset property
– Easily parallelized
– Easy to implement

• Disadvantages
– Assumes transaction database is memory
resident.
– Requires many database scans.

20
Summary
•
•
•
•
•
•

Association Rules form an very applied data mining
approach.
Association Rules are derived from frequent itemsets.
The Apriori algorithm is an efficient algorithm for
finding all frequent itemsets.
The Apriori algorithm implements level-wise search
using frequent item property.
The Apriori algorithm can be additionally optimized.
There are many measures for association rules.

21

More Related Content

What's hot

Apriori Algorithm
Apriori AlgorithmApriori Algorithm
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
Krish_ver2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
rahulmath80
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
nouraalkhatib
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
Gautam Thakur
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
Yanchang Zhao
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
IJSRD
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
deepti92pawar
 
Associations1
Associations1Associations1
Associations1
mancnilu
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
Mohit Rajput
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Gangadhar S
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
Pier Luca Lanzi
 
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULESIMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
International Journal of Technical Research & Application
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
guest0edcaf
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Mainul Hassan
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
Pier Luca Lanzi
 
07 fp advanced
07 fp advanced07 fp advanced
07 fp advanced
JoonyoungJayGwak
 

What's hot (20)

Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Associations1
Associations1Associations1
Associations1
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULESIMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
07 fp advanced
07 fp advanced07 fp advanced
07 fp advanced
 

Similar to Rmining

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
dataminers.ir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
Dung Nguyen
 
MiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.pptMiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.ppt
l228296
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET Journal
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
AdwaitLaud
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
cscpconf
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
lahiruherath654
 
B0950814
B0950814B0950814
B0950814
IOSR Journals
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
cscpconf
 
Data Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aData Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts a
OllieShoresna
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
PALLAB DAS
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
ssuser957b41
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
JoonyoungJayGwak
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
 
Cluster2
Cluster2Cluster2
Cluster2
work
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 

Similar to Rmining (20)

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
MiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.pptMiningAssociationbestRulespresentation.ppt
MiningAssociationbestRulespresentation.ppt
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
 
B0950814
B0950814B0950814
B0950814
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
 
Data Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aData Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts a
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Cluster2
Cluster2Cluster2
Cluster2
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

Rmining

  • 1. Outline 1. Data Mining (DM) ~ KDD [Definition] 2. DM Technique -> Association rules [support & confidence] 3. Example (4. Apriori Algorithm)
  • 2. 1. Data Mining ~ KDD [Definition] - "Data mining (DM), also called KnowledgeDiscovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns using specific DM technique." - [more formal definition] KDD ~ "the non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data"
  • 3. 1. Data Mining ~ KDD [Definition] Data Mining techniques • • • • • • Information Visualization k-nearest neighbor decision trees neural networks association rules …
  • 4. 2. Association rules Support Every association rule has a support and a confidence. “The support is the percentage of transactions that demonstrate the rule.” Example: Database with transactions ( customer_# : item_a1, item_a2, …) 1: 2: 3: 4: 1, 3, 5. 1, 8, 14, 17, 12. 4, 6, 8, 12, 9, 104. 2, 1, 8. support {8,12} = 2 (,or 50% ~ 2 of 4 customers) support {1, 5} = 1 (,or 25% ~ 1 of 4 customers ) support {1} = 3 (,or 75% ~ 3 of 4 customers)
  • 5. 2. Association rules Support An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold add to previous example: if threshold 50% then itemsets {8,12} and {1} called frequent
  • 6. 2. Association rules Confidence Every association rule has a support and a confidence. An association rule is of the form: X => Y • X => Y: if someone buys X, he also buys Y The confidence is the conditional probability that, given X present in a transition , Y will also be present. Confidence measure, by definition: Confidence(X=>Y) equals support(X,Y) / support(X)
  • 7. 2. Association rules Confidence We should only consider rules derived from itemsets with high support, and that also have high confidence. “A rule with low confidence is not meaningful.” Rules don’t explain anything, they just point out hard facts in data volumes.
  • 8. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {5} => {8} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
  • 9. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
  • 10. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? 57% Done. Rule ( {5} => {8} ) more meaningful then Rule ( {8} => {5} )
  • 11. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {9} => {3} ) ? supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1, then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
  • 12. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf( {9} => {3} ) = 100%. Done. Notice: High Confidence, Low Support. -> Rule ( {9} => {3} ) not meaningful
  • 13. Apriori Algorithm • In computer science and data mining, Apriori is a classic algorithm for learning association rules. • Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). • The algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets. 13
  • 14. Definition (contd.) • Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data. • The algorithm terminates when no further successful extensions are found. • Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. 14
  • 15. 15
  • 16. Steps to Perform Apriori Algorithm 16
  • 17. Apriori Algorithm Examples Problem Decomposition Transaction ID Items Bought 1 Shoes, Shirt, Jacket 2 Shoes,Jacket 3 Shoes, Jeans 4 Shirt, Sweatshirt If the minimum support is 50%, then {Shoes, Jacket} is the only 2itemset that satisfies the minimum support. Frequent Itemset {Shoes} {Shirt} {Jacket} {Shoes, Jacket} Support 75% 50% 50% 50% If the minimum confidence is 50%, then the only two rules generated from this 2itemset, that have confidence greater than 50%, are: Shoes ⇒ Jacket Support=50%, Confidence=66% Jacket ⇒ Shoes Support=50%, Confidence=100% 17
  • 18. The Apriori Algorithm — Example Min support =50% Database D TID 100 200 300 400 itemset sup. C1 {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 Items 134 235 1235 25 L2 itemset sup C2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5} Scan D {1 3} {2 3} {2 5} {3 5} 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 3} 5} 3} 5} 5} 18
  • 19. Pseudo Code for Apriori Algorithm 19
  • 20. Apriori Advantages/Disadvantages • Advantages – Uses large itemset property – Easily parallelized – Easy to implement • Disadvantages – Assumes transaction database is memory resident. – Requires many database scans. 20
  • 21. Summary • • • • • • Association Rules form an very applied data mining approach. Association Rules are derived from frequent itemsets. The Apriori algorithm is an efficient algorithm for finding all frequent itemsets. The Apriori algorithm implements level-wise search using frequent item property. The Apriori algorithm can be additionally optimized. There are many measures for association rules. 21