SlideShare a Scribd company logo
1 of 39
BAS 250
Lesson 4: Association Rules
 Explain the concept of association rules
 Develop an association rules data mining model
 Understand the Apriori algorithm
 Interpret output generated by model
 Effectively employ the CRISP-DM method to your
assignment
This Week’s Learning Objectives
 Association rules are a data mining methodology that
seeks to find frequent connections between attributes
in a data set
o An example of this could be a shopping basket analysis
where marketers and vendors try to find which products
are most frequently purchased together
Association Rules
 Study of “what goes with what”
o “Customers who bought X also bought Y”
o What symptoms go with what diagnosis
 Transaction-based or event-based
 Also called “market basket analysis” and “affinity analysis”
 Originated with study of customer transactions databases
to determine associations among items purchased
What are Association Rules?
 So what kind of items are we talking about? There are many
applications of association:
o Product recommendation – like Amazon’s “customers who bought that,
also bought this”
o Medical diagnosis – like with diabetes
o Content optimization – magazine websites or blogs or even menu
design for a restaurant
o DNA genome analysis – patterns in cellular data
o Fraud detection in finance
Association Rules
Used in many recommender systems
Screenshot, Amazon.com
 Key Point:
 Association rules use “if/then” statements to
uncover relationships between seemingly
unrelated data.
 “If a customer buys a dozen eggs, then he is 80%
more likely to also purchase milk.”
Association Rules
 “IF” part = antecedent
 “THEN” part = consequent
 “Item set” = the items (e.g., products) comprising the antecedent or
consequent
 Antecedent and consequent are disjoint when there are no items in
common
Terms
 Although association rule operators require
binominal data types, it’s helpful to evaluate
the average (avg) and standard deviation for
each attribute.
Association Rules
Association Rules
 Standard deviations are measurements of how
dispersed or varied the values in an attribute are
o A good rule of thumb- any value that is smaller than two
standard deviations below the mean or two standard
deviations above the mean is a statistical outlier
 For example, if Average = 36.731 and Standard Deviation =
10.647, acceptable range of values should be 15.437 to 58.025
 Not a hard-and-fast rule
Association Rules
 RapidMiner uses binominal instead of binomial
o Binomial means one of two numbers (usually 0 and 1),
meaning the basic underlying data type is still numeric
o Binominal means one of two values which may be numeric
or character based
Association Rules
 An example of a data type transformation of all attributes in
RapidMiner
Association Rules
 Frequency Pattern analysis is handy for many
kinds of data mining and is a necessary
component of association rule mining
o We use this to determine whether any of the
patterns in the data occur often enough to be
considered rules
Association Rules
Results of an FP-Growth operator in RapidMiner
Association Rules
 Two main factors that dictate whether or not frequency
patterns get translated into association rules:
 Confidence percent - how confident we are that when one
attribute is flagged as true, the associated attribute will also
be flagged as true
 Support percent - the number of times that the rule did
occur, divided by the number of observations in the data set
Confidence & Support Percent
 Out of 10 shopping baskets…
 Cookies were purchased in 4 baskets
 Milk was purchased in 7 baskets
 Cookies  3 out of 4 instances where cookies were purchased milk
was also in those baskets
 Confidence % = How often are we when 1 attribute = True that an associated
attribute = True
 Therefore, we have a 75% confidence in the association rule cookies g milk
Confidence Percent - Example
 Out of 10 shopping baskets…
 Milk was purchased in 7 baskets
 Cookies were purchased in 4 baskets
 Milk  3 out of 7 instances where milk was purchased that cookies
were also in those baskets
 Confidence % = How often are we when 1 attribute = True that an associated
attribute = True
 Therefore, we have a 43% confidence in the association rule milk g cookies
Confidence Percent - Example
 Premise (or antecedent) g Conclusion (or consequent)
 Low Confidence % = only happens by chance and can
be used to eliminate uninteresting rules
 When evaluating associations between three or more
attributes, the confidence percentages are calculated
based on the two attributes being found with the third
Confidence Percent (cont.)
 An example of association rules found with
50% confidence threshold in RapidMiner
Confidence Percent (cont.)
 If cookies and milk were found together 3 out of
the 10 shopping baskets, support percentage is
calculated as 30% (3/10 = .3)
o There is no reciprocal for support percentages since
this metric is simply the number of times the
association occurred over the number of times it could
have occurred in the data set
Support Percent
 The higher the Support %, the higher the likelihood for Y to be present in
purchases of X
 Estimate of conditional probability of Y given X.
o Remember: “If a customer buys a dozen eggs, then he is 80% more likely to
also purchase milk.”
 There is no reciprocal for support percentages since this metric is simply
the number of times the association occurred over the number of times it
could have occurred in the data set
Support Percent
 Minimum Confidence = 50%
 Minimum Support = 20%
 However, for large data sets, your minimum support
could be much lower and still be valid.
 Hint: Your homework will contain a large data set.
Typical Ranges
 6 colors within 10 Transactions
Tiny Example: Phone Faceplates
Transaction Faceplate Colors Purchased
1 Red White Green
2 White Orange
3 White Blue
4 Red White Orange
5 Red Blue
6 White Blue
7 White Orange
8 Red White Blue Green
9 Red White Blue
10 yellow
 For example: Transaction 1 supports several
rules, such as
o “If red, then white” (“If a red faceplate is purchased,
then so is a white one”)
o “If white, then red”
o “If red and white, then green”
o + several more
Many Rules are Possible
 Ideally, we want to create all possible combinations of items
 Problem: computation time grows exponentially as # items
increases
 Solution: consider only “frequent item sets”
 Criterion for frequent: Support %
Frequent Item Sets
 Support = # (or percent) of transactions that
include both the antecedent and the consequent
 Example: support for the item set {red, white} is 4
out of 10 transactions, or 40%
Support
Apriori Algorithm
 Apriori Principle:
 “If an item set is frequent, then all of its
subsets MUST also be frequent.”
Apriori Algorithm
 The strategy of trimming the exponential search
space based on the Support measure is known
as “Support-based Pruning”
 This is the foundational component of the Apriori
Algorithm.
Apriori Algorithm
 For k products…
 User sets a minimum support criterion
 Next, generate list of one-item sets that meet the support criterion
 Use the list of one-item sets to generate list of two-item sets that
meet the support criterion
 Use list of two-item sets to generate list of three-item sets
 Continue up through k-item sets
Generating Frequent Item Sets
 Confidence: the % of antecedent transactions that also have the
consequent item set
 Lift = confidence/(benchmark confidence)
 Benchmark confidence = transactions with consequent as % of all
transactions
 Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e.,
more useful than just selecting transactions randomly)
Measures of Performance
 Generate all rules that meet specified support
& confidence
o Find frequent item sets (those with sufficient
support – see above)
o From these item sets, generate rules with
sufficient confidence
Process of Rule Selection
 {red, white} > {green} with confidence = 2/4 = 50%
o [(support {red, white, green})/(support {red, white})]
 {red, green} > {white} with confidence = 2/2 = 100%
o [(support {red, white, green})/(support {red, green})]
 Plus 4 more with confidence of 100%, 33%, 29% & 100%
 If confidence criterion is 70%, report only rules 2, 3 and 6
Example: Rules from {red, white,
green}
 Lift ratio shows how effective the rule is in finding consequents
(useful if finding particular consequents is important)
 Confidence shows the rate at which consequents will be found
(useful in learning costs of promotion)
 Support measures overall impact
Interpretation
 Random data can generate apparently interesting
association rules
 The more rules you produce, the greater this danger
 Rules based on large numbers of records are less
subject to this danger
Caution: The Role of Chance
 Association rules (or affinity analysis, or market basket analysis) produce
rules on associations between items from a database of transactions
 Widely used in recommender systems
 Most popular method is Apriori algorithm
 To reduce computation, we consider only “frequent” item sets (=support)
 Performance is measured by confidence and lift
 Can produce a profusion of rules; review is required to identify useful rules
and to reduce redundancy
Summary
 Explain the concept of association rules
 Develop an association rules data mining model
 Understand the Apriori algorithm
 Interpret output generated by model
 Effectively employ the CRISP-DM method to your
assignment
Summary - Learning Objectives
“This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s
Employment and Training Administration. The solution was created by the grantee and does not
necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor
makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such
information, including any information on linked sites and including, but not limited to, accuracy of the
information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building Capacity in
Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative
Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
Copyright Information

More Related Content

What's hot

How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
 
cfa in mplus
cfa in mplus cfa in mplus
cfa in mplus SadhakNK
 
Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015Webanalisten .nl
 
Data science course in mysore
Data science course in mysoreData science course in mysore
Data science course in mysoreTejaspathiLV
 
Basic statistical & pharmaceutical statistical applications
Basic statistical & pharmaceutical statistical applicationsBasic statistical & pharmaceutical statistical applications
Basic statistical & pharmaceutical statistical applicationsYogitaKolekar1
 

What's hot (7)

Parameter
ParameterParameter
Parameter
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
cfa in mplus
cfa in mplus cfa in mplus
cfa in mplus
 
Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015
 
Data science course in mysore
Data science course in mysoreData science course in mysore
Data science course in mysore
 
Basic statistical & pharmaceutical statistical applications
Basic statistical & pharmaceutical statistical applicationsBasic statistical & pharmaceutical statistical applications
Basic statistical & pharmaceutical statistical applications
 
Pearson Chi-Square
Pearson Chi-SquarePearson Chi-Square
Pearson Chi-Square
 

Viewers also liked (9)

Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
doc_2055
doc_2055doc_2055
doc_2055
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data mining
Data miningData mining
Data mining
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 

Similar to BAS 250 Lecture 4

big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptxAmenahAbbood
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)Kumar P
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithmhina firdaus
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesDerek Kane
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_RulesFEG
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligenceGeo Marian
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Association Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAssociation Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAnjumaaraAnsari
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...Smarten Augmented Analytics
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdfJyoti Yadav
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progressoveesingh
 
Association Mining
Association Mining Association Mining
Association Mining Edureka!
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopVarunSahdev2
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 

Similar to BAS 250 Lecture 4 (20)

big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
Data Mining
Data Mining Data Mining
Data Mining
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
Cluster2
Cluster2Cluster2
Cluster2
 
Association Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptxAssociation Rule Mining with Apriori Algorithm.pptx
Association Rule Mining with Apriori Algorithm.pptx
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
Assignment #3 10.19.14
Assignment #3 10.19.14Assignment #3 10.19.14
Assignment #3 10.19.14
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Association Mining
Association Mining Association Mining
Association Mining
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 

More from Wake Tech BAS

BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureWake Tech BAS
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureWake Tech BAS
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureWake Tech BAS
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureWake Tech BAS
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureWake Tech BAS
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureWake Tech BAS
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture Wake Tech BAS
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureWake Tech BAS
 

More from Wake Tech BAS (12)

BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
BAS 250 Lecture 3
BAS 250 Lecture 3BAS 250 Lecture 3
BAS 250 Lecture 3
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
 

Recently uploaded

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

BAS 250 Lecture 4

  • 1. BAS 250 Lesson 4: Association Rules
  • 2.  Explain the concept of association rules  Develop an association rules data mining model  Understand the Apriori algorithm  Interpret output generated by model  Effectively employ the CRISP-DM method to your assignment This Week’s Learning Objectives
  • 3.  Association rules are a data mining methodology that seeks to find frequent connections between attributes in a data set o An example of this could be a shopping basket analysis where marketers and vendors try to find which products are most frequently purchased together Association Rules
  • 4.  Study of “what goes with what” o “Customers who bought X also bought Y” o What symptoms go with what diagnosis  Transaction-based or event-based  Also called “market basket analysis” and “affinity analysis”  Originated with study of customer transactions databases to determine associations among items purchased What are Association Rules?
  • 5.  So what kind of items are we talking about? There are many applications of association: o Product recommendation – like Amazon’s “customers who bought that, also bought this” o Medical diagnosis – like with diabetes o Content optimization – magazine websites or blogs or even menu design for a restaurant o DNA genome analysis – patterns in cellular data o Fraud detection in finance Association Rules
  • 6. Used in many recommender systems Screenshot, Amazon.com
  • 7.  Key Point:  Association rules use “if/then” statements to uncover relationships between seemingly unrelated data.  “If a customer buys a dozen eggs, then he is 80% more likely to also purchase milk.” Association Rules
  • 8.  “IF” part = antecedent  “THEN” part = consequent  “Item set” = the items (e.g., products) comprising the antecedent or consequent  Antecedent and consequent are disjoint when there are no items in common Terms
  • 9.  Although association rule operators require binominal data types, it’s helpful to evaluate the average (avg) and standard deviation for each attribute. Association Rules
  • 11.  Standard deviations are measurements of how dispersed or varied the values in an attribute are o A good rule of thumb- any value that is smaller than two standard deviations below the mean or two standard deviations above the mean is a statistical outlier  For example, if Average = 36.731 and Standard Deviation = 10.647, acceptable range of values should be 15.437 to 58.025  Not a hard-and-fast rule Association Rules
  • 12.  RapidMiner uses binominal instead of binomial o Binomial means one of two numbers (usually 0 and 1), meaning the basic underlying data type is still numeric o Binominal means one of two values which may be numeric or character based Association Rules
  • 13.  An example of a data type transformation of all attributes in RapidMiner Association Rules
  • 14.  Frequency Pattern analysis is handy for many kinds of data mining and is a necessary component of association rule mining o We use this to determine whether any of the patterns in the data occur often enough to be considered rules Association Rules
  • 15. Results of an FP-Growth operator in RapidMiner Association Rules
  • 16.  Two main factors that dictate whether or not frequency patterns get translated into association rules:  Confidence percent - how confident we are that when one attribute is flagged as true, the associated attribute will also be flagged as true  Support percent - the number of times that the rule did occur, divided by the number of observations in the data set Confidence & Support Percent
  • 17.  Out of 10 shopping baskets…  Cookies were purchased in 4 baskets  Milk was purchased in 7 baskets  Cookies  3 out of 4 instances where cookies were purchased milk was also in those baskets  Confidence % = How often are we when 1 attribute = True that an associated attribute = True  Therefore, we have a 75% confidence in the association rule cookies g milk Confidence Percent - Example
  • 18.  Out of 10 shopping baskets…  Milk was purchased in 7 baskets  Cookies were purchased in 4 baskets  Milk  3 out of 7 instances where milk was purchased that cookies were also in those baskets  Confidence % = How often are we when 1 attribute = True that an associated attribute = True  Therefore, we have a 43% confidence in the association rule milk g cookies Confidence Percent - Example
  • 19.  Premise (or antecedent) g Conclusion (or consequent)  Low Confidence % = only happens by chance and can be used to eliminate uninteresting rules  When evaluating associations between three or more attributes, the confidence percentages are calculated based on the two attributes being found with the third Confidence Percent (cont.)
  • 20.  An example of association rules found with 50% confidence threshold in RapidMiner Confidence Percent (cont.)
  • 21.  If cookies and milk were found together 3 out of the 10 shopping baskets, support percentage is calculated as 30% (3/10 = .3) o There is no reciprocal for support percentages since this metric is simply the number of times the association occurred over the number of times it could have occurred in the data set Support Percent
  • 22.  The higher the Support %, the higher the likelihood for Y to be present in purchases of X  Estimate of conditional probability of Y given X. o Remember: “If a customer buys a dozen eggs, then he is 80% more likely to also purchase milk.”  There is no reciprocal for support percentages since this metric is simply the number of times the association occurred over the number of times it could have occurred in the data set Support Percent
  • 23.  Minimum Confidence = 50%  Minimum Support = 20%  However, for large data sets, your minimum support could be much lower and still be valid.  Hint: Your homework will contain a large data set. Typical Ranges
  • 24.  6 colors within 10 Transactions Tiny Example: Phone Faceplates Transaction Faceplate Colors Purchased 1 Red White Green 2 White Orange 3 White Blue 4 Red White Orange 5 Red Blue 6 White Blue 7 White Orange 8 Red White Blue Green 9 Red White Blue 10 yellow
  • 25.  For example: Transaction 1 supports several rules, such as o “If red, then white” (“If a red faceplate is purchased, then so is a white one”) o “If white, then red” o “If red and white, then green” o + several more Many Rules are Possible
  • 26.  Ideally, we want to create all possible combinations of items  Problem: computation time grows exponentially as # items increases  Solution: consider only “frequent item sets”  Criterion for frequent: Support % Frequent Item Sets
  • 27.  Support = # (or percent) of transactions that include both the antecedent and the consequent  Example: support for the item set {red, white} is 4 out of 10 transactions, or 40% Support
  • 29.  Apriori Principle:  “If an item set is frequent, then all of its subsets MUST also be frequent.” Apriori Algorithm
  • 30.  The strategy of trimming the exponential search space based on the Support measure is known as “Support-based Pruning”  This is the foundational component of the Apriori Algorithm. Apriori Algorithm
  • 31.  For k products…  User sets a minimum support criterion  Next, generate list of one-item sets that meet the support criterion  Use the list of one-item sets to generate list of two-item sets that meet the support criterion  Use list of two-item sets to generate list of three-item sets  Continue up through k-item sets Generating Frequent Item Sets
  • 32.  Confidence: the % of antecedent transactions that also have the consequent item set  Lift = confidence/(benchmark confidence)  Benchmark confidence = transactions with consequent as % of all transactions  Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly) Measures of Performance
  • 33.  Generate all rules that meet specified support & confidence o Find frequent item sets (those with sufficient support – see above) o From these item sets, generate rules with sufficient confidence Process of Rule Selection
  • 34.  {red, white} > {green} with confidence = 2/4 = 50% o [(support {red, white, green})/(support {red, white})]  {red, green} > {white} with confidence = 2/2 = 100% o [(support {red, white, green})/(support {red, green})]  Plus 4 more with confidence of 100%, 33%, 29% & 100%  If confidence criterion is 70%, report only rules 2, 3 and 6 Example: Rules from {red, white, green}
  • 35.  Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important)  Confidence shows the rate at which consequents will be found (useful in learning costs of promotion)  Support measures overall impact Interpretation
  • 36.  Random data can generate apparently interesting association rules  The more rules you produce, the greater this danger  Rules based on large numbers of records are less subject to this danger Caution: The Role of Chance
  • 37.  Association rules (or affinity analysis, or market basket analysis) produce rules on associations between items from a database of transactions  Widely used in recommender systems  Most popular method is Apriori algorithm  To reduce computation, we consider only “frequent” item sets (=support)  Performance is measured by confidence and lift  Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy Summary
  • 38.  Explain the concept of association rules  Develop an association rules data mining model  Understand the Apriori algorithm  Interpret output generated by model  Effectively employ the CRISP-DM method to your assignment Summary - Learning Objectives
  • 39. “This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.” Except where otherwise stated, this work by Wake Technical Community College Building Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Copyright Information