SlideShare a Scribd company logo
Association
Rule Mining
Understanding
Association
Rules Mining
Concepts
Association Rule Mining
Association rule mining is a
procedure which is meant to find
frequent patterns, correlations,
associations, or causal structures
from datasets found in various
kinds of databases such as
relational databases, transactional
databases, and other forms of data
repositories.
Simply; when this, then also this
Association Rule Mining
Used to identify -
● Frequent Patterns
● Correlations
● Associations
● Causal Structures
where these are applied → movie recommendations, grocery item placements, product recommendations, etc.
Algorithm - Apriori - Metrics
Following three metrics are generally used -
Support: The percentage of transactions that contain all of the items in an item set.
● The higher the support the more frequently the item set occurs.
● Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions.
Confidence: The probability that a transaction that contains the items on the left hand side of the rule also
contains the item on the right hand side.
● The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other
words, the greater the return rate we can expect for a given rule.
Lift: The probability of all of the items in a rule occurring together divided by the product of the probabilities
of the items on the left and right hand side occurring as if there was no association between them.
● Overall, lift summarizes the strength of association between the products on the left and right hand
side of the rule; the larger the lift the greater the link between the two products.
Apriori - Support
Apriori - Support
Apriori - Confidence
Apriori - Confidence
Apriori - Confidence
Apriori - Confidence
Apriori - Lift
Apriori - Lift
Apriori - Lift
Algorithm
Step 1 Set a minimum & maximum Support and Confidence.
Step 2 Take all the subsets in transactions having higher support than
minimum support.
Step 3 Take all the rules of these subsets having higher confidence than
minimum confidence.
Step 4 Generate other rule assessment measures for the rules.
Step 5 Sort the rules by using an appropriate filter.
Cons → Slow Algorithm as it’s a bottom up approach and makes pair from all
available factors and compute related statistics
Another Example
Other Rule Assessment Measures
● Added Value
● All-confidence
● Casual Confidence
● Casual Support
● Certainty Factor
● Chi-Squared
● Cross-Support Ratio
● Collective Strength
● Confidence
● Conviction
● Cosine
● Coverage
● Descriptive Confirmed Confidence
● Difference of Confidence
● Example & Counter-Example Rate
● Fisher's Exact Test
● Gini Index
● Hyper-Confidence
● Hyper-Lift
● Imbalance Ratio
● Improvement
● Jaccard Coefficient
● J-Measure
● Kappa
● Klosgen
● Kulczynski
● Goodman-Kruskal Lambda
● Laplace Corrected Confidence
● Least Contradiction
● Lerman Similarity
● Leverage
● Lift
● MaxConf
● Mutual Information
● Odds Ratio
● Phi Correlation Coefficient
● Ralambrodrainy Measure
● Relative Linkage Disequilibrium
● Relative Support
● Rule Power Factor
● Sebag-Schoenauer Measure
● Support
● Varying Rates Liaison
● Yule's Q
● Yule's Y
Support, Relative Support
Support:
● Support of a rule is defined as the number of transactions that contain both X and Y.
● Used as a measure of significance of a rule.
Symmetric Measure
Range: [0, INF)
Formula:
Relative Support:
● Relative Support is the fraction of transactions that contain both X and Y.
● ⇒ Empirical Joint Probability of the items comprising the rule.
● Used as a measure of significance of a rule.
Symmetric Measure
Range: [0, 1]
Formula:
Support, Relative Support
Confidence (a.k.a. Strength)
Confidence:
● Confidence of a rule is the conditional probability that a transaction contains the
consequent Y given that it contains the antecedent X.
● Problem with Confidence is that it is sensitive to the frequency of the consequent Y in the
database.
● Caused by the way the confidence is calculated, consequents with higher support will
automatically produce higher confidence values even if there is no association b/w the
items.
Asymmetric Measure
Range: [0, 1]
Formula:
Confidence (a.k.a. Strength)
Lift (a.k.a. Interest)
Lift:
● Lift is defined as the ratio of the observed joint probability of X and Y to the expected joint
probability if they were statistically independent.
● Lift is susceptible to noise in small databases.
● Caused by the way the confidence is calculated, rare itemsets with low counts (low
probability) which by chance occur a few times (or only once) together will produce
enormous lift values.
Symmetric Measure
Range: [0, INF) (1 means independence)
Formula:
Lift (a.k.a. Interest)
Coverage (a.k.a. antecedent support or LHS support)
Coverage:
● Coverage is defined as the relative support of the antecedent X i.e. it is is the fraction of
transactions that contain X.
● ⇒ Empirical Probability of the item X.
● Used as a measure of significance of a rule.
Asymmetric Measure
Range: [0, 1]
Formula:
Difference of Confidence
Difference of Confidence:
● .
● .
● .
Asymmetric Measure
Range: [-1, 1]
Formula:
Certainty Factor (a.k.a. Loevinger)
Certainty Factor:
● It is a measure of variation of the probability that Y is in transaction when only
considering transactions with X.
● An increasing CF means a decrease of the probability that Y is not in a transaction that X
is in. Negative CFs have a similar interpretation.
Asymmetric Measure
Range: [-1, 1] (0 means independence)
Formula:
Leverage
Leverage:
● Leverage measures the difference between the observed and expected joint probability of
XY assuming that X and Y are independent.
● Leverage gives an absolute measure of how surprising a rule is and should be used
together with lift.
● Can be interpreted as gap to independence.
Symmetric Measure
Range: [-1, 1] (0 means independence)
Formula:
Leverage
Rule A→ E may be preferable over the first two because it is simpler and has higher leverage
Jaccard Coefficient (a.k.a. Coherence)
Jaccard Coefficient:
● This coefficient measure the similarity between two sets.
Symmetric Measure
Range: [-1, 1] (0 means independence)
Formula:
Jaccard Coefficient (a.k.a. Coherence)
Contingency Table for X and Y
Conviction
Conviction:
● Conviction measures the expected error of the rule i.e. how often X occurs in a
transaction where Y does not.
● Thus it can be said that it is a measure of the strength of the rule wrt the complement of
the consequent.
● If the joint probability of X!Y is less than that expected under independence of X and !Y,
then conviction is high, and vice versa.
● An alternative to confidence which was found not to capture direction of association s
adequately.
Asymmetric Measure
Range: [0, INF) (1 means independence, rule that always hold have INF)
Formula:
Conviction
Odds Ratio
Odds Ratio:
● It is defined as the odds of finding X in transactions which contain Y divided by the odds
of finding X in transactions which do not contain Y.
● Lift is susceptible to noise in small databases.
● Odds ratios greater than 1 imply higher odds of Y occurring in the presence of X as
opposed to its complement !X , whereas odds smaller than one imply higher odds of Y
occurring with !X.
Symmetric Measure
Range: [0, INF) (1 means independence)
Formula:
Odds Ratio
Mining the patterns to
Develop Rules
Filter Used
(CASE
WHEN Itemset only present on BOTH Side THEN (FLOAT(CriticalClass_oddsRatio) - 0)
WHEN Itemset present on BOTH Side THEN (FLOAT(CriticalClass_oddsRatio) - FLOAT(Gen_oddsRatio))
WHEN Itemset only present on GENERAL Side THEN (0 - FLOAT(Gen_oddsRatio)) END) AS Diff_CriticalClassGen_OddsRatio,
Diff_CriticalClassGen_Conviction,
Diff_CriticalClassGen_Supp,
Diff_CriticalClassGen_Certainty,
*
FROM {
| #Handling INFINITY value
|FROM
| Table
|WHERE
| #viewing entries ONLY present on GEN side OR viewing entries ONLY present on CriticalClass side
}
ORDER BY
#rule_rhs desc,
#rule_lhs desc,
Diff_CriticalClassGen_OddsRatio DESC,
Diff_CriticalClassGen_Conviction DESC,
Diff_CriticalClassGen_Supp DESC,
#Diff_CriticalClassGen_Certainty desc,
Mining Pattern
Step 1: Run the query.
Step 2: Be creative and with some intuition select some item.
Step 3: Modify the query so that it gives pair with selected
item and again be creative and with intuition select some
item.
Using the discovered pair for further increasing the
pattern
Step a: use the discovered pair as lhs part and run the
query on table with increased rule length.
Step b: Be creative and with some intuition select the
next item.
Using the discovered pair for further analyzing
Step a: Use the existing pair to get raw data and
analyze it.
Step b: Use the existing pair to get derived parameter
data and analyze it (also check for existing critical
class signature + location).
Step c: If discovered pair indeed is adequate and is
finding some critical class, use this signature.
- Testing for FP
- If adequate use it for blocking
Rule Developed
Mining the patterns to
Develop Rules
Limitation and Further Work
Issues and Fine tuning
● Issues b/c of the data inconsistency in streaming data
● Modifying data Preprocessing for the itemset
●
● Version on Derived Parameters
Understanding Association Rule Mining

More Related Content

What's hot

Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
Er. Nawaraj Bhandari
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Data reduction
Data reductionData reduction
Data reduction
kalavathisugan
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Mahendra Gupta
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Salah Amean
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Evaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROCEvaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROC
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
Learnbay Datascience
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 

What's hot (20)

Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data reduction
Data reductionData reduction
Data reduction
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Random forest
Random forestRandom forest
Random forest
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Evaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROCEvaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROC
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 

Similar to Understanding Association Rule Mining

Association rules
Association rulesAssociation rules
Association rules
Dr. C.V. Suresh Babu
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
Francisco E. Figueroa-Nigaglioni
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
Enamul Islam
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association Rules
IOSR Journals
 
Marketing Research-Factor Analysis
Marketing Research-Factor AnalysisMarketing Research-Factor Analysis
Marketing Research-Factor Analysis
Arun Gupta
 
G0364347
G0364347G0364347
G0364347
iosrjournals
 
Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...
Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...
Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...
National Cheng Kung University
 
08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx
Rishashetty8
 
Factor Analysis - Statistics
Factor Analysis - StatisticsFactor Analysis - Statistics
Factor Analysis - Statistics
Thiyagu K
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
SmithaRaj16
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
Nima
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants ukjondarita
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
Mr. Moms
 
Priya
PriyaPriya
Priya
Student
 
What is iso 9001 standard
What is iso 9001 standardWhat is iso 9001 standard
What is iso 9001 standardjomjintra
 
How much does iso 9001 cost
How much does iso 9001 costHow much does iso 9001 cost
How much does iso 9001 costjondarita
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
Chode Amarnath
 
Statistical data handling
Statistical data handling Statistical data handling
Statistical data handling
Rohan Jagdale
 

Similar to Understanding Association Rule Mining (20)

Assignment #3 10.19.14
Assignment #3 10.19.14Assignment #3 10.19.14
Assignment #3 10.19.14
 
Association rules
Association rulesAssociation rules
Association rules
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association Rules
 
Marketing Research-Factor Analysis
Marketing Research-Factor AnalysisMarketing Research-Factor Analysis
Marketing Research-Factor Analysis
 
G0364347
G0364347G0364347
G0364347
 
Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...
Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...
Association Rule (Data Mining) - Frequent Itemset Generation, Closed Frequent...
 
08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx
 
Factor Analysis - Statistics
Factor Analysis - StatisticsFactor Analysis - Statistics
Factor Analysis - Statistics
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants uk
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
 
Priya
PriyaPriya
Priya
 
What is iso 9001 standard
What is iso 9001 standardWhat is iso 9001 standard
What is iso 9001 standard
 
How much does iso 9001 cost
How much does iso 9001 costHow much does iso 9001 cost
How much does iso 9001 cost
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
 
Statistical data handling
Statistical data handling Statistical data handling
Statistical data handling
 

More from Mohit Rajput

Understanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknownUnderstanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknown
Mohit Rajput
 
Algorithms in Reinforcement Learning
Algorithms in Reinforcement LearningAlgorithms in Reinforcement Learning
Algorithms in Reinforcement Learning
Mohit Rajput
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Dissertation mid evaluation
Dissertation mid evaluationDissertation mid evaluation
Dissertation mid evaluation
Mohit Rajput
 
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
Mohit Rajput
 
Mid-Dissertation Work Done Report
Mid-Dissertation Work Done ReportMid-Dissertation Work Done Report
Mid-Dissertation Work Done Report
Mohit Rajput
 
Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation  Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation
Mohit Rajput
 
Sura ppt final
Sura ppt finalSura ppt final
Sura ppt final
Mohit Rajput
 
SURA Final report PVDF-CNT
SURA Final report PVDF-CNTSURA Final report PVDF-CNT
SURA Final report PVDF-CNT
Mohit Rajput
 
R markup code to create Regression Model
R markup code to create Regression ModelR markup code to create Regression Model
R markup code to create Regression Model
Mohit Rajput
 
Regression Model for movies
Regression Model for moviesRegression Model for movies
Regression Model for movies
Mohit Rajput
 
Presentation- BCP self assembly meshes
Presentation- BCP self assembly meshesPresentation- BCP self assembly meshes
Presentation- BCP self assembly meshes
Mohit Rajput
 
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer  meshes by orthogonal self-assemblyPresentation- Multilayer block copolymer  meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
Mohit Rajput
 
Cover for report on Biofuels Generation
Cover for report on Biofuels GenerationCover for report on Biofuels Generation
Cover for report on Biofuels Generation
Mohit Rajput
 
A Report on Metal Drawing Operations
A Report on Metal Drawing OperationsA Report on Metal Drawing Operations
A Report on Metal Drawing Operations
Mohit Rajput
 
A technical report on BioFuels Generation
A technical report on BioFuels GenerationA technical report on BioFuels Generation
A technical report on BioFuels Generation
Mohit Rajput
 
Presentation - Bio-fuels Generation
Presentation - Bio-fuels GenerationPresentation - Bio-fuels Generation
Presentation - Bio-fuels Generation
Mohit Rajput
 
Status of Education in India by Mohit Rajput
Status of Education in India by Mohit RajputStatus of Education in India by Mohit Rajput
Status of Education in India by Mohit Rajput
Mohit Rajput
 
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Mohit Rajput
 
Posters for Exhibition
Posters for ExhibitionPosters for Exhibition
Posters for Exhibition
Mohit Rajput
 

More from Mohit Rajput (20)

Understanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknownUnderstanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknown
 
Algorithms in Reinforcement Learning
Algorithms in Reinforcement LearningAlgorithms in Reinforcement Learning
Algorithms in Reinforcement Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Dissertation mid evaluation
Dissertation mid evaluationDissertation mid evaluation
Dissertation mid evaluation
 
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
 
Mid-Dissertation Work Done Report
Mid-Dissertation Work Done ReportMid-Dissertation Work Done Report
Mid-Dissertation Work Done Report
 
Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation  Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation
 
Sura ppt final
Sura ppt finalSura ppt final
Sura ppt final
 
SURA Final report PVDF-CNT
SURA Final report PVDF-CNTSURA Final report PVDF-CNT
SURA Final report PVDF-CNT
 
R markup code to create Regression Model
R markup code to create Regression ModelR markup code to create Regression Model
R markup code to create Regression Model
 
Regression Model for movies
Regression Model for moviesRegression Model for movies
Regression Model for movies
 
Presentation- BCP self assembly meshes
Presentation- BCP self assembly meshesPresentation- BCP self assembly meshes
Presentation- BCP self assembly meshes
 
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer  meshes by orthogonal self-assemblyPresentation- Multilayer block copolymer  meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
 
Cover for report on Biofuels Generation
Cover for report on Biofuels GenerationCover for report on Biofuels Generation
Cover for report on Biofuels Generation
 
A Report on Metal Drawing Operations
A Report on Metal Drawing OperationsA Report on Metal Drawing Operations
A Report on Metal Drawing Operations
 
A technical report on BioFuels Generation
A technical report on BioFuels GenerationA technical report on BioFuels Generation
A technical report on BioFuels Generation
 
Presentation - Bio-fuels Generation
Presentation - Bio-fuels GenerationPresentation - Bio-fuels Generation
Presentation - Bio-fuels Generation
 
Status of Education in India by Mohit Rajput
Status of Education in India by Mohit RajputStatus of Education in India by Mohit Rajput
Status of Education in India by Mohit Rajput
 
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
 
Posters for Exhibition
Posters for ExhibitionPosters for Exhibition
Posters for Exhibition
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 

Understanding Association Rule Mining

  • 3. Association Rule Mining Association rule mining is a procedure which is meant to find frequent patterns, correlations, associations, or causal structures from datasets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories. Simply; when this, then also this
  • 4. Association Rule Mining Used to identify - ● Frequent Patterns ● Correlations ● Associations ● Causal Structures where these are applied → movie recommendations, grocery item placements, product recommendations, etc.
  • 5. Algorithm - Apriori - Metrics Following three metrics are generally used - Support: The percentage of transactions that contain all of the items in an item set. ● The higher the support the more frequently the item set occurs. ● Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions. Confidence: The probability that a transaction that contains the items on the left hand side of the rule also contains the item on the right hand side. ● The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other words, the greater the return rate we can expect for a given rule. Lift: The probability of all of the items in a rule occurring together divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them. ● Overall, lift summarizes the strength of association between the products on the left and right hand side of the rule; the larger the lift the greater the link between the two products.
  • 15. Algorithm Step 1 Set a minimum & maximum Support and Confidence. Step 2 Take all the subsets in transactions having higher support than minimum support. Step 3 Take all the rules of these subsets having higher confidence than minimum confidence. Step 4 Generate other rule assessment measures for the rules. Step 5 Sort the rules by using an appropriate filter. Cons → Slow Algorithm as it’s a bottom up approach and makes pair from all available factors and compute related statistics
  • 17. Other Rule Assessment Measures ● Added Value ● All-confidence ● Casual Confidence ● Casual Support ● Certainty Factor ● Chi-Squared ● Cross-Support Ratio ● Collective Strength ● Confidence ● Conviction ● Cosine ● Coverage ● Descriptive Confirmed Confidence ● Difference of Confidence ● Example & Counter-Example Rate ● Fisher's Exact Test ● Gini Index ● Hyper-Confidence ● Hyper-Lift ● Imbalance Ratio ● Improvement ● Jaccard Coefficient ● J-Measure ● Kappa ● Klosgen ● Kulczynski ● Goodman-Kruskal Lambda ● Laplace Corrected Confidence ● Least Contradiction ● Lerman Similarity ● Leverage ● Lift ● MaxConf ● Mutual Information ● Odds Ratio ● Phi Correlation Coefficient ● Ralambrodrainy Measure ● Relative Linkage Disequilibrium ● Relative Support ● Rule Power Factor ● Sebag-Schoenauer Measure ● Support ● Varying Rates Liaison ● Yule's Q ● Yule's Y
  • 18. Support, Relative Support Support: ● Support of a rule is defined as the number of transactions that contain both X and Y. ● Used as a measure of significance of a rule. Symmetric Measure Range: [0, INF) Formula: Relative Support: ● Relative Support is the fraction of transactions that contain both X and Y. ● ⇒ Empirical Joint Probability of the items comprising the rule. ● Used as a measure of significance of a rule. Symmetric Measure Range: [0, 1] Formula:
  • 20. Confidence (a.k.a. Strength) Confidence: ● Confidence of a rule is the conditional probability that a transaction contains the consequent Y given that it contains the antecedent X. ● Problem with Confidence is that it is sensitive to the frequency of the consequent Y in the database. ● Caused by the way the confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there is no association b/w the items. Asymmetric Measure Range: [0, 1] Formula:
  • 22. Lift (a.k.a. Interest) Lift: ● Lift is defined as the ratio of the observed joint probability of X and Y to the expected joint probability if they were statistically independent. ● Lift is susceptible to noise in small databases. ● Caused by the way the confidence is calculated, rare itemsets with low counts (low probability) which by chance occur a few times (or only once) together will produce enormous lift values. Symmetric Measure Range: [0, INF) (1 means independence) Formula:
  • 24. Coverage (a.k.a. antecedent support or LHS support) Coverage: ● Coverage is defined as the relative support of the antecedent X i.e. it is is the fraction of transactions that contain X. ● ⇒ Empirical Probability of the item X. ● Used as a measure of significance of a rule. Asymmetric Measure Range: [0, 1] Formula:
  • 25. Difference of Confidence Difference of Confidence: ● . ● . ● . Asymmetric Measure Range: [-1, 1] Formula:
  • 26. Certainty Factor (a.k.a. Loevinger) Certainty Factor: ● It is a measure of variation of the probability that Y is in transaction when only considering transactions with X. ● An increasing CF means a decrease of the probability that Y is not in a transaction that X is in. Negative CFs have a similar interpretation. Asymmetric Measure Range: [-1, 1] (0 means independence) Formula:
  • 27. Leverage Leverage: ● Leverage measures the difference between the observed and expected joint probability of XY assuming that X and Y are independent. ● Leverage gives an absolute measure of how surprising a rule is and should be used together with lift. ● Can be interpreted as gap to independence. Symmetric Measure Range: [-1, 1] (0 means independence) Formula:
  • 28. Leverage Rule A→ E may be preferable over the first two because it is simpler and has higher leverage
  • 29. Jaccard Coefficient (a.k.a. Coherence) Jaccard Coefficient: ● This coefficient measure the similarity between two sets. Symmetric Measure Range: [-1, 1] (0 means independence) Formula:
  • 32. Conviction Conviction: ● Conviction measures the expected error of the rule i.e. how often X occurs in a transaction where Y does not. ● Thus it can be said that it is a measure of the strength of the rule wrt the complement of the consequent. ● If the joint probability of X!Y is less than that expected under independence of X and !Y, then conviction is high, and vice versa. ● An alternative to confidence which was found not to capture direction of association s adequately. Asymmetric Measure Range: [0, INF) (1 means independence, rule that always hold have INF) Formula:
  • 34. Odds Ratio Odds Ratio: ● It is defined as the odds of finding X in transactions which contain Y divided by the odds of finding X in transactions which do not contain Y. ● Lift is susceptible to noise in small databases. ● Odds ratios greater than 1 imply higher odds of Y occurring in the presence of X as opposed to its complement !X , whereas odds smaller than one imply higher odds of Y occurring with !X. Symmetric Measure Range: [0, INF) (1 means independence) Formula:
  • 36. Mining the patterns to Develop Rules
  • 37. Filter Used (CASE WHEN Itemset only present on BOTH Side THEN (FLOAT(CriticalClass_oddsRatio) - 0) WHEN Itemset present on BOTH Side THEN (FLOAT(CriticalClass_oddsRatio) - FLOAT(Gen_oddsRatio)) WHEN Itemset only present on GENERAL Side THEN (0 - FLOAT(Gen_oddsRatio)) END) AS Diff_CriticalClassGen_OddsRatio, Diff_CriticalClassGen_Conviction, Diff_CriticalClassGen_Supp, Diff_CriticalClassGen_Certainty, * FROM { | #Handling INFINITY value |FROM | Table |WHERE | #viewing entries ONLY present on GEN side OR viewing entries ONLY present on CriticalClass side } ORDER BY #rule_rhs desc, #rule_lhs desc, Diff_CriticalClassGen_OddsRatio DESC, Diff_CriticalClassGen_Conviction DESC, Diff_CriticalClassGen_Supp DESC, #Diff_CriticalClassGen_Certainty desc,
  • 38. Mining Pattern Step 1: Run the query. Step 2: Be creative and with some intuition select some item. Step 3: Modify the query so that it gives pair with selected item and again be creative and with intuition select some item. Using the discovered pair for further increasing the pattern Step a: use the discovered pair as lhs part and run the query on table with increased rule length. Step b: Be creative and with some intuition select the next item. Using the discovered pair for further analyzing Step a: Use the existing pair to get raw data and analyze it. Step b: Use the existing pair to get derived parameter data and analyze it (also check for existing critical class signature + location). Step c: If discovered pair indeed is adequate and is finding some critical class, use this signature. - Testing for FP - If adequate use it for blocking Rule Developed
  • 39. Mining the patterns to Develop Rules Limitation and Further Work
  • 40. Issues and Fine tuning ● Issues b/c of the data inconsistency in streaming data ● Modifying data Preprocessing for the itemset ● ● Version on Derived Parameters