SlideShare a Scribd company logo
1 of 20
Association Analysis to
Correlation Analysis
Pattern Evaluation
• Association rule algorithms tend to produce too many rules
‒ many of them are uninteresting or redundant
‒ Redundant if {A,B,C}  {D} and {A,B}  {D} have same
support & confidence
• Interestingness measures can be used to prune/rank the derived
patterns
• In the original formulation of association rules, support & confidence
are the only measures used
• Application of Interestingness Measure
• Given a rule X  Y, information needed to compute rule interestingness
can be obtained from a contingency table
Y Y
X f11 f10 f1+
X f01 f00 fo+
f+1 f+0 |T|
Contingency table for X  Y
f11: support of X and Y
f10: support of X and Y
f01: support of X and Y
f00: support of X and Y
Used to define various measures
 support, confidence, lift, Gini,
J-measure, etc.
• Computing Interestingness Measure
• Drawback of Confidence
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea  Coffee
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
 Although confidence is high, rule is misleading
 P(Coffee|Tea) = 0.9375
Correlation Concepts
• Two item sets A and B are independent (the occurrence of A is
independent of the occurrence of item set B) iff
P(A  B) = P(A)  P(B)
• Otherwise A and B are dependent and correlated
• The measure of correlation, or correlation between A and B is
given by the formula:
Corr(A,B)= P(A U B ) / P(A) . P(B)
6
Correlation Concepts [Cont.]
• corr(A,B) >1 means that A and B are positively correlated
i.e. the occurrence of one implies the occurrence of the other.
• corr(A,B) < 1 means that the occurrence of A is negatively correlated
with (or discourages) the occurrence of B.
• corr(A,B) =1 means that A and B are independent and there is no
correlation between them.
7
• Statistical Independence
• Population of 1000 students
• 600 students know how to swim (S)
• 700 students know how to bike (B)
• 420 students know how to swim and bike (S,B)
• P(SB) = 420/1000 = 0.42
• P(S)  P(B) = 0.6  0.7 = 0.42
• P(SB) = P(S)  P(B) => Statistical independence
• P(SB) > P(S)  P(B) => Positively correlated
• P(SB) < P(S)  P(B) => Negatively correlated
Association & Correlation
• The correlation formula can be re-written as
Corr(A,B) = P(B|A) / P(B)
• We already know that
• Support(A B)= P(AUB)
• Confidence(A  B)= P(B|A)
• That means that, Confidence(A B)= corr(A,B) P(B)
• So correlation, support and confidence are all different, but the correlation
provides an extra information about the association rule (A B).
• We say that the correlation
corr(A,B) provides the LIFT of the association rule (A=>B),
• i.e. A is said to increase (or LIFT) the likelihood of B by the factor of the
value returned by the formula for corr(A,B). 9
Statistical-based Measures
• Measures that take into account statistical dependence
)](1)[()](1)[(
)()(),(
)()(),(
)()(
),(
)(
)|(
YPYPXPXP
YPXPYXP
tcoefficien
YPXPYXPPS
YPXP
YXP
Interest
YP
XYP
Lift







P(A and B) = P(A) x P(B|A)
P(A) x P(B|A) = P(A and B)
P(B|A) = P(A and B) / P(A)
Interestingness Measure: Correlations (Lift)
• play basketball  eat cereal [40%, 66.7%] is misleading
• The overall % of students eating cereal is 75% > 66.7%.
• play basketball  not eat cereal [20%, 33.3%] is more accurate, although with lower
support and confidence
• Measure of dependent/correlated events: lift
89.0
5000/3750*5000/3000
5000/2000
),( CBlift
Basketball Not
basketball
Sum
(row)
Cereal 2000 1750 3750
Not cereal 1000 250 1250
Sum(col.) 3000 2000 5000
)()(
)(
BPAP
BAP
lift


33.1
5000/1250*5000/3000
5000/1000
),( CBlift
Example: Lift/Interest
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
 Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated)
Association Rule: Tea  Coffee
Example: -Coefficient
• -coefficient is analogous to correlation coefficient for continuous
variables
Y Y
X 60 10 70
X 10 20 30
70 30 100
Y Y
X 20 10 30
X 10 60 70
30 70 100
5238.0
3.07.03.07.0
7.07.06.0




 Coefficient is the same for both tables
5238.0
3.07.03.07.0
3.03.02.0




There are lots of measures
proposed in the literature
Some measures are good for
certain applications, but not for
others
What criteria should we use to
determine whether a measure is
good or bad?
What about Apriori-style
support based pruning? How
does it affect these measures?
Summary
• Definition (Support): The support of an itemset I is defined as the fraction of the transactions
in the database T = {T1 . . . Tn} that contain I as a subset.
support (AB) = P(AB)
Relative support: The itemset support defined in above equation is sometimes referred to as relative support.
Absolute support: whereas the occurrence frequency is called the absolute support.
(If the relative support of an itemset I satisfies a prespecified minimum support threshold (i.e., the absolute
support of I satisfies the corresponding minimum support count threshold), then I is a frequent itemset)
• Definition (Frequent Itemset Mining): Given a set of transactions T = {T1 . . . Tn}, where each
transaction Ti is a subset of items from U, determine all itemsets I that occur as a subset of at
least a predefined fraction minsup of the transactions in T.
• Definition (Maximal Frequent Itemsets): A frequent itemset is maximal at a given minimum
support level minsup, if it is frequent, and no superset of it is frequent.
The Frequent Pattern Mining Model
• Property (Support Monotonicity Property):The support of every
subset J of I is at least equal to that of the support of itemset I.
sup(J) ≥ sup(I) ∀J ⊆ I
• Property (Downward Closure Property): Every subset of a frequent
itemset is also frequent.
The Frequent Pattern Mining Model
Summary
• Definition (Confidence): Let X and Y be two sets of items. The confidence
conf(X∪Y ) of the rule X∪Y is the conditional probability of X∪Y occurring in a
transaction, given that the transaction contains X. Therefore, the confidence
conf(X⇒ Y ) is defined as follows:
• Definition (Association Rules) Let X and Y be two sets of items. Then, the rule
X⇒Y is said to be an association rule at a minimum support of minsup and
minimum confidence of minconf, if it satisfies both the following criteria:
1. The support of the itemset X ∪ Y is at least minsup.
2. The confidence of the rule X ⇒ Y is at least minconf.
• Property 4.3.1 (Confidence Monotonicity) Let X1, X2, and I be itemsets such that
X1 ⊂ X2 ⊂ I. Then the confidence of X2 ⇒ I − X2 is at least that of X1 ⇒ I − X1.
conf(X2 ⇒ I − X2) ≥ conf(X1 ⇒ I − X1)
Association Rule Generation Framework
Introduction to DATA MINING, Vipin Kumar, P N Tan, Michael Steinbach
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis

More Related Content

What's hot

Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 

What's hot (20)

Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 

Similar to Lect7 Association analysis to correlation analysis

STAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptx
STAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptxSTAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptx
STAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptxPranshuyadav16
 
1.11.association mining 3
1.11.association mining 31.11.association mining 3
1.11.association mining 3Krish_ver2
 
Lesson07_new
Lesson07_newLesson07_new
Lesson07_newshengvn
 
Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020Viswanath Gangavaram
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework HelpExcel Homework Help
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxkrunal soni
 
Statistik 1 11 15 edited_chi square
Statistik 1 11 15 edited_chi squareStatistik 1 11 15 edited_chi square
Statistik 1 11 15 edited_chi squareSelvin Hadi
 
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...SYRTO Project
 
Assumptions of OLS.pptx
Assumptions of OLS.pptxAssumptions of OLS.pptx
Assumptions of OLS.pptxEzhildev
 
An introduction to the Multivariable analysis.ppt
An introduction to the Multivariable analysis.pptAn introduction to the Multivariable analysis.ppt
An introduction to the Multivariable analysis.pptvigia41
 
Lec1.regression
Lec1.regressionLec1.regression
Lec1.regressionAftab Alam
 

Similar to Lect7 Association analysis to correlation analysis (20)

Basic of Hypothesis Testing TEKU QM
Basic of Hypothesis Testing TEKU QMBasic of Hypothesis Testing TEKU QM
Basic of Hypothesis Testing TEKU QM
 
01_SLR_final (1).pptx
01_SLR_final (1).pptx01_SLR_final (1).pptx
01_SLR_final (1).pptx
 
STAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptx
STAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptxSTAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptx
STAT 206 - Chapter 4 ( 4.1-4.2 Basic Probability).pptx
 
1.11.association mining 3
1.11.association mining 31.11.association mining 3
1.11.association mining 3
 
Lesson07_new
Lesson07_newLesson07_new
Lesson07_new
 
Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework Help
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
statics in research
statics in researchstatics in research
statics in research
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Dbm630 lecture05
Dbm630 lecture05Dbm630 lecture05
Dbm630 lecture05
 
Statistik 1 11 15 edited_chi square
Statistik 1 11 15 edited_chi squareStatistik 1 11 15 edited_chi square
Statistik 1 11 15 edited_chi square
 
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
MEM and SEM in the GME framework: Modelling Perception and Satisfaction - Car...
 
Lecture 1.pdf
Lecture 1.pdfLecture 1.pdf
Lecture 1.pdf
 
Assumptions of OLS.pptx
Assumptions of OLS.pptxAssumptions of OLS.pptx
Assumptions of OLS.pptx
 
An introduction to the Multivariable analysis.ppt
An introduction to the Multivariable analysis.pptAn introduction to the Multivariable analysis.ppt
An introduction to the Multivariable analysis.ppt
 
Lec1.regression
Lec1.regressionLec1.regression
Lec1.regression
 

More from hktripathy

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your datahktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data samplinghktripathy
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualizationhktripathy
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & predictionhktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your datahktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 

More from hktripathy (18)

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 

Recently uploaded

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 

Recently uploaded (20)

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

Lect7 Association analysis to correlation analysis

  • 2. Pattern Evaluation • Association rule algorithms tend to produce too many rules ‒ many of them are uninteresting or redundant ‒ Redundant if {A,B,C}  {D} and {A,B}  {D} have same support & confidence • Interestingness measures can be used to prune/rank the derived patterns • In the original formulation of association rules, support & confidence are the only measures used
  • 3. • Application of Interestingness Measure
  • 4. • Given a rule X  Y, information needed to compute rule interestingness can be obtained from a contingency table Y Y X f11 f10 f1+ X f01 f00 fo+ f+1 f+0 |T| Contingency table for X  Y f11: support of X and Y f10: support of X and Y f01: support of X and Y f00: support of X and Y Used to define various measures  support, confidence, lift, Gini, J-measure, etc. • Computing Interestingness Measure
  • 5. • Drawback of Confidence Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Although confidence is high, rule is misleading  P(Coffee|Tea) = 0.9375
  • 6. Correlation Concepts • Two item sets A and B are independent (the occurrence of A is independent of the occurrence of item set B) iff P(A  B) = P(A)  P(B) • Otherwise A and B are dependent and correlated • The measure of correlation, or correlation between A and B is given by the formula: Corr(A,B)= P(A U B ) / P(A) . P(B) 6
  • 7. Correlation Concepts [Cont.] • corr(A,B) >1 means that A and B are positively correlated i.e. the occurrence of one implies the occurrence of the other. • corr(A,B) < 1 means that the occurrence of A is negatively correlated with (or discourages) the occurrence of B. • corr(A,B) =1 means that A and B are independent and there is no correlation between them. 7
  • 8. • Statistical Independence • Population of 1000 students • 600 students know how to swim (S) • 700 students know how to bike (B) • 420 students know how to swim and bike (S,B) • P(SB) = 420/1000 = 0.42 • P(S)  P(B) = 0.6  0.7 = 0.42 • P(SB) = P(S)  P(B) => Statistical independence • P(SB) > P(S)  P(B) => Positively correlated • P(SB) < P(S)  P(B) => Negatively correlated
  • 9. Association & Correlation • The correlation formula can be re-written as Corr(A,B) = P(B|A) / P(B) • We already know that • Support(A B)= P(AUB) • Confidence(A  B)= P(B|A) • That means that, Confidence(A B)= corr(A,B) P(B) • So correlation, support and confidence are all different, but the correlation provides an extra information about the association rule (A B). • We say that the correlation corr(A,B) provides the LIFT of the association rule (A=>B), • i.e. A is said to increase (or LIFT) the likelihood of B by the factor of the value returned by the formula for corr(A,B). 9
  • 10. Statistical-based Measures • Measures that take into account statistical dependence )](1)[()](1)[( )()(),( )()(),( )()( ),( )( )|( YPYPXPXP YPXPYXP tcoefficien YPXPYXPPS YPXP YXP Interest YP XYP Lift        P(A and B) = P(A) x P(B|A) P(A) x P(B|A) = P(A and B) P(B|A) = P(A and B) / P(A)
  • 11. Interestingness Measure: Correlations (Lift) • play basketball  eat cereal [40%, 66.7%] is misleading • The overall % of students eating cereal is 75% > 66.7%. • play basketball  not eat cereal [20%, 33.3%] is more accurate, although with lower support and confidence • Measure of dependent/correlated events: lift 89.0 5000/3750*5000/3000 5000/2000 ),( CBlift Basketball Not basketball Sum (row) Cereal 2000 1750 3750 Not cereal 1000 250 1250 Sum(col.) 3000 2000 5000 )()( )( BPAP BAP lift   33.1 5000/1250*5000/3000 5000/1000 ),( CBlift
  • 12. Example: Lift/Interest Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated) Association Rule: Tea  Coffee
  • 13. Example: -Coefficient • -coefficient is analogous to correlation coefficient for continuous variables Y Y X 60 10 70 X 10 20 30 70 30 100 Y Y X 20 10 30 X 10 60 70 30 70 100 5238.0 3.07.03.07.0 7.07.06.0      Coefficient is the same for both tables 5238.0 3.07.03.07.0 3.03.02.0    
  • 14. There are lots of measures proposed in the literature Some measures are good for certain applications, but not for others What criteria should we use to determine whether a measure is good or bad? What about Apriori-style support based pruning? How does it affect these measures?
  • 15. Summary • Definition (Support): The support of an itemset I is defined as the fraction of the transactions in the database T = {T1 . . . Tn} that contain I as a subset. support (AB) = P(AB) Relative support: The itemset support defined in above equation is sometimes referred to as relative support. Absolute support: whereas the occurrence frequency is called the absolute support. (If the relative support of an itemset I satisfies a prespecified minimum support threshold (i.e., the absolute support of I satisfies the corresponding minimum support count threshold), then I is a frequent itemset) • Definition (Frequent Itemset Mining): Given a set of transactions T = {T1 . . . Tn}, where each transaction Ti is a subset of items from U, determine all itemsets I that occur as a subset of at least a predefined fraction minsup of the transactions in T. • Definition (Maximal Frequent Itemsets): A frequent itemset is maximal at a given minimum support level minsup, if it is frequent, and no superset of it is frequent. The Frequent Pattern Mining Model
  • 16. • Property (Support Monotonicity Property):The support of every subset J of I is at least equal to that of the support of itemset I. sup(J) ≥ sup(I) ∀J ⊆ I • Property (Downward Closure Property): Every subset of a frequent itemset is also frequent. The Frequent Pattern Mining Model
  • 17. Summary • Definition (Confidence): Let X and Y be two sets of items. The confidence conf(X∪Y ) of the rule X∪Y is the conditional probability of X∪Y occurring in a transaction, given that the transaction contains X. Therefore, the confidence conf(X⇒ Y ) is defined as follows: • Definition (Association Rules) Let X and Y be two sets of items. Then, the rule X⇒Y is said to be an association rule at a minimum support of minsup and minimum confidence of minconf, if it satisfies both the following criteria: 1. The support of the itemset X ∪ Y is at least minsup. 2. The confidence of the rule X ⇒ Y is at least minconf. • Property 4.3.1 (Confidence Monotonicity) Let X1, X2, and I be itemsets such that X1 ⊂ X2 ⊂ I. Then the confidence of X2 ⇒ I − X2 is at least that of X1 ⇒ I − X1. conf(X2 ⇒ I − X2) ≥ conf(X1 ⇒ I − X1) Association Rule Generation Framework
  • 18. Introduction to DATA MINING, Vipin Kumar, P N Tan, Michael Steinbach