SlideShare a Scribd company logo
1 of 21
Outline
1. Data Mining (DM) ~ KDD [Definition]
2. DM Technique
-> Association rules [support & confidence]
3. Example
(4. Apriori Algorithm)
1. Data Mining ~ KDD [Definition]
- "Data mining (DM), also called KnowledgeDiscovery in Databases (KDD), is the process
of automatically searching large volumes of
data for patterns using specific DM
technique."

- [more formal definition] KDD ~ "the non-trivial
extraction of implicit, previously unknown
and potentially useful knowledge from data"
1. Data Mining ~ KDD [Definition]
Data Mining techniques
•
•
•
•
•
•

Information Visualization
k-nearest neighbor
decision trees
neural networks
association rules
…
2. Association rules
Support
Every association rule has a support and a confidence.
“The support is the percentage of transactions that demonstrate the rule.”

Example: Database with transactions ( customer_# : item_a1, item_a2,
…)
1:
2:
3:
4:

1, 3, 5.
1, 8, 14, 17, 12.
4, 6, 8, 12, 9, 104.
2, 1, 8.

support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
support {1} = 3 (,or 75% ~ 3 of 4 customers)
2. Association rules
Support

An itemset is called frequent if its support is equal or
greater than an agreed upon minimal value – the support
threshold
add to previous example:
if threshold 50%
then itemsets {8,12} and {1} called frequent
2. Association rules
Confidence
Every association rule has a support and a confidence.
An association rule is of the form:

X => Y

• X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X
present in a transition , Y will also be present.
Confidence measure, by definition:
Confidence(X=>Y) equals support(X,Y) / support(X)
2. Association rules
Confidence

We should only consider rules derived from
itemsets with high support, and that also have
high confidence.
“A rule with low confidence is not meaningful.”
Rules don’t explain anything, they just point out
hard facts in data volumes.
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {5} => {8} ) ?
supp({5}) = 5
, supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5
, supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.
Conf ( {8} => {5} ) ? 57% Done.
Rule ( {5} => {8} ) more meaningful then
Rule ( {8} => {5} )
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {9} => {3} ) ?
supp({9}) = 1
, supp({3}) = 1 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.
Notice: High Confidence, Low Support.
-> Rule ( {9} => {3} ) not meaningful
Apriori Algorithm
• In computer science and data mining, Apriori is
a classic algorithm for learning association rules.
• Apriori is designed to operate on databases
containing transactions (for example, collections
of items bought by customers, or details of a
website frequentation).
• The algorithm attempts to find subsets which are
common to at least a minimum number C (the
cutoff, or confidence threshold) of the itemsets.

13
Definition (contd.)
• Apriori uses a "bottom up" approach, where
frequent subsets are extended one item at a
time (a step known as candidate generation, and
groups of candidates are tested against the
data.
• The algorithm terminates when no further
successful extensions are found.
• Apriori uses breadth-first search and a hash
tree structure to count candidate item sets
efficiently.
14
15
Steps to Perform Apriori
Algorithm

16
Apriori Algorithm Examples
Problem Decomposition
Transaction ID Items Bought
1
Shoes, Shirt, Jacket
2
Shoes,Jacket
3
Shoes, Jeans
4
Shirt, Sweatshirt

If the minimum support is 50%, then {Shoes, Jacket} is the only 2itemset that satisfies the minimum support.
Frequent Itemset
{Shoes}
{Shirt}
{Jacket}
{Shoes, Jacket}

Support
75%
50%
50%
50%

If the minimum confidence is 50%, then the only two rules generated from this 2itemset, that have confidence greater than 50%, are:
Shoes ⇒ Jacket Support=50%, Confidence=66%
Jacket ⇒ Shoes Support=50%, Confidence=100%

17
The Apriori Algorithm — Example
Min support =50%

Database D
TID
100
200
300
400

itemset sup.
C1
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3

Items
134
235
1235
25

L2 itemset sup

C2 itemset sup

2
2
3
2

{1
{1
{1
{2
{2
{3

C3 itemset
{2 3 5}

Scan D

{1 3}
{2 3}
{2 5}
{3 5}

2}
3}
5}
3}
5}
5}

1
2
1
2
3
2

L1 itemset sup.
{1}
{2}
{3}
{5}

2
3
3
3

C2 itemset
{1 2}
Scan D

L3 itemset sup
{2 3 5} 2

{1
{1
{2
{2
{3

3}
5}
3}
5}
5}

18
Pseudo Code for Apriori
Algorithm

19
Apriori
Advantages/Disadvantages
• Advantages
– Uses large itemset property
– Easily parallelized
– Easy to implement

• Disadvantages
– Assumes transaction database is memory
resident.
– Requires many database scans.

20
Summary
•
•
•
•
•
•

Association Rules form an very applied data mining
approach.
Association Rules are derived from frequent itemsets.
The Apriori algorithm is an efficient algorithm for
finding all frequent itemsets.
The Apriori algorithm implements level-wise search
using frequent item property.
The Apriori algorithm can be additionally optimized.
There are many measures for association rules.

21

More Related Content

What's hot

1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2Krish_ver2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rulesGautam Thakur
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmdeepti92pawar
 
Associations1
Associations1Associations1
Associations1mancnilu
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule MiningMohit Rajput
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalramya marichamy
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithmGangadhar S
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsPier Luca Lanzi
 

What's hot (20)

Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Associations1
Associations1Associations1
Associations1
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULESIMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
07 fp advanced
07 fp advanced07 fp advanced
07 fp advanced
 

Similar to Rmining

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Conceptsdataminers.ir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map ReducingIRJET Journal
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxlahiruherath654
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence cscpconf
 
Data Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aData Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aOllieShoresna
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule MiningPALLAB DAS
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 

Similar to Rmining (20)

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
 
B0950814
B0950814B0950814
B0950814
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
 
Data Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aData Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts a
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Cluster2
Cluster2Cluster2
Cluster2
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
6asso
6asso6asso
6asso
 

Recently uploaded

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Rmining

  • 1. Outline 1. Data Mining (DM) ~ KDD [Definition] 2. DM Technique -> Association rules [support & confidence] 3. Example (4. Apriori Algorithm)
  • 2. 1. Data Mining ~ KDD [Definition] - "Data mining (DM), also called KnowledgeDiscovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns using specific DM technique." - [more formal definition] KDD ~ "the non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data"
  • 3. 1. Data Mining ~ KDD [Definition] Data Mining techniques • • • • • • Information Visualization k-nearest neighbor decision trees neural networks association rules …
  • 4. 2. Association rules Support Every association rule has a support and a confidence. “The support is the percentage of transactions that demonstrate the rule.” Example: Database with transactions ( customer_# : item_a1, item_a2, …) 1: 2: 3: 4: 1, 3, 5. 1, 8, 14, 17, 12. 4, 6, 8, 12, 9, 104. 2, 1, 8. support {8,12} = 2 (,or 50% ~ 2 of 4 customers) support {1, 5} = 1 (,or 25% ~ 1 of 4 customers ) support {1} = 3 (,or 75% ~ 3 of 4 customers)
  • 5. 2. Association rules Support An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold add to previous example: if threshold 50% then itemsets {8,12} and {1} called frequent
  • 6. 2. Association rules Confidence Every association rule has a support and a confidence. An association rule is of the form: X => Y • X => Y: if someone buys X, he also buys Y The confidence is the conditional probability that, given X present in a transition , Y will also be present. Confidence measure, by definition: Confidence(X=>Y) equals support(X,Y) / support(X)
  • 7. 2. Association rules Confidence We should only consider rules derived from itemsets with high support, and that also have high confidence. “A rule with low confidence is not meaningful.” Rules don’t explain anything, they just point out hard facts in data volumes.
  • 8. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {5} => {8} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
  • 9. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
  • 10. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? 57% Done. Rule ( {5} => {8} ) more meaningful then Rule ( {8} => {5} )
  • 11. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {9} => {3} ) ? supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1, then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
  • 12. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf( {9} => {3} ) = 100%. Done. Notice: High Confidence, Low Support. -> Rule ( {9} => {3} ) not meaningful
  • 13. Apriori Algorithm • In computer science and data mining, Apriori is a classic algorithm for learning association rules. • Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). • The algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets. 13
  • 14. Definition (contd.) • Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data. • The algorithm terminates when no further successful extensions are found. • Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. 14
  • 15. 15
  • 16. Steps to Perform Apriori Algorithm 16
  • 17. Apriori Algorithm Examples Problem Decomposition Transaction ID Items Bought 1 Shoes, Shirt, Jacket 2 Shoes,Jacket 3 Shoes, Jeans 4 Shirt, Sweatshirt If the minimum support is 50%, then {Shoes, Jacket} is the only 2itemset that satisfies the minimum support. Frequent Itemset {Shoes} {Shirt} {Jacket} {Shoes, Jacket} Support 75% 50% 50% 50% If the minimum confidence is 50%, then the only two rules generated from this 2itemset, that have confidence greater than 50%, are: Shoes ⇒ Jacket Support=50%, Confidence=66% Jacket ⇒ Shoes Support=50%, Confidence=100% 17
  • 18. The Apriori Algorithm — Example Min support =50% Database D TID 100 200 300 400 itemset sup. C1 {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 Items 134 235 1235 25 L2 itemset sup C2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5} Scan D {1 3} {2 3} {2 5} {3 5} 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 3} 5} 3} 5} 5} 18
  • 19. Pseudo Code for Apriori Algorithm 19
  • 20. Apriori Advantages/Disadvantages • Advantages – Uses large itemset property – Easily parallelized – Easy to implement • Disadvantages – Assumes transaction database is memory resident. – Requires many database scans. 20
  • 21. Summary • • • • • • Association Rules form an very applied data mining approach. Association Rules are derived from frequent itemsets. The Apriori algorithm is an efficient algorithm for finding all frequent itemsets. The Apriori algorithm implements level-wise search using frequent item property. The Apriori algorithm can be additionally optimized. There are many measures for association rules. 21