SlideShare a Scribd company logo
1 of 13
MINING FREQUENT PATTERNS,
ASSOCIATION AND CORRELATIONS
Data Mining…
Nadar Saraswathi College Of Arts & Science
Submitted by,
Nagapandiyammal N
W H AT I S F R E Q U E N T PAT T E R N A N A LY S I S ?
 Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that
occurs frequently in a data set.
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of
frequent itemsets and association rule mining.
 Motivation: Finding inherent regularities in data
What products were often purchased together?— Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
I M P O RTA N T F R E Q U E N T PAT T E R N M I N I N G
 Discloses an intrinsic and important property of data sets
 Forms the foundation for many essential data mining tasks
Association, correlation, and causality analysis
Sequential, structural (e.g., sub-graph) patterns
Pattern analysis in spatiotemporal, multimedia, time series, and stream data
Classification: associative classification
Cluster analysis: frequent pattern-based clustering
Data warehousing: iceberg cube and cube-gradient
Semantic data compression: fascicles
Broad applications
C L O S E D PAT T E R N S A N D M A X - PAT T E R N S
 A long pattern contains a combinatorial number of sub patterns,
e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) =2100 – 1
= 1.27*1030 sub-patterns!
 An item set X is closed if X is frequent and there exists no super-pattern
Y ‫כ‬X, with the same support as X
 An item set X is a max-pattern if X is frequent and there exists no
frequent super-pattern Y ‫כ‬X
 Closed pattern is a lossless compression of freq. patterns
Reducing the # of patterns and rules
SC A LA BLE M ETHOD S FOR M IN IN G
FR EQU EN T PATTER N S
 The downward closure property of frequent patterns
Any subset of a frequent item set must be frequent
If {beer, diaper, nuts} is frequent, so is {beer, diaper}
i.e., every transaction having {beer, diaper, nuts} also contains
{beer, diaper}
Scalable mining methods: Three major approaches
 Apriori
 Frequent pattern growth
 Vertical data format approach
A P R I O R I : A C A N D I D AT E G E N E R AT I O N A N D
T E S T A P P R O A C H
 Apriori pruning principle: If there is any item set which is infrequent, its
superset should not be generated/tested.
 Method:
Initially, scan DB once to get frequent 1-itemset
Generate length (k+1) candidate itemsets from length k
frequent itemsets
Test the candidates against DB
Terminate when no frequent or candidate set can be generated
THE APRIORI ALGORITHM
 Pseudo-code:
Ck: Candidate item set of size k
Lk: frequent item set of size k
L1= {frequent items};
for (k = 1; Lk !=Æ; k++) do begin
Ck+1=candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1=candidates in Ck+1 with min support
End
return Èk Lk;
IM PORTA N T D ETA ILS OF A PR IOR I
 How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
How to count supports of candidates?
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
CHALLENGES OF FREQUENT
PATTERN MINING
 Challenges
 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Improving Apriori: general ideas
 Reduce passes of transaction database scans
 Shrink number of candidates
 Facilitate support counting of candidates
SA M PLE FOR FR EQU EN T PATTER N S
 Select a sample of original database, mine frequent
 patterns within sample using Apriori
 Scan database once to verify frequent itemsets found in sample,
only borders of closure of frequent patterns are checked
 Example: check abcd instead of ab, ac, …, etc.
 Scan database again to find missed frequent patterns
T H E E X A M P L E O F A P R I O R I A L G O R I T H M
SUMMARY
 Frequent pattern mining—an important task in data mining
 Scalable frequent pattern mining methods
Apriori (Candidate generation & test)
Projection-based (FP growth, CLOSET+, ...)
Vertical format approach (CHARM, ...)
 Mining a variety of rules and interesting patterns
 Constraint-based mining
 Mining sequential and structured patterns
 Extensions and applications
THE END

More Related Content

Similar to Data Mining

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptNBACriteria2SICET
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmShivarkarSandip
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatternsKamal Singh Lodhi
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptRaviKiranVarma4
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithmijceronline
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...ijsrd.com
 

Similar to Data Mining (20)

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
 
05
0505
05
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
6 module 4
6 module 46 module 4
6 module 4
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
My6asso
My6assoMy6asso
My6asso
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithm
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
 

More from NilaNila16

Basic Block Scheduling
Basic Block SchedulingBasic Block Scheduling
Basic Block SchedulingNilaNila16
 
Affine Array Indexes
Affine Array IndexesAffine Array Indexes
Affine Array IndexesNilaNila16
 
Software Engineering
Software EngineeringSoftware Engineering
Software EngineeringNilaNila16
 
Web Programming
Web ProgrammingWeb Programming
Web ProgrammingNilaNila16
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmNilaNila16
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemNilaNila16
 
Operating system
Operating systemOperating system
Operating systemNilaNila16
 
Linear Block Codes
Linear Block CodesLinear Block Codes
Linear Block CodesNilaNila16
 
Applications of graph theory
                      Applications of graph theory                      Applications of graph theory
Applications of graph theoryNilaNila16
 
Recurrence Relation
Recurrence RelationRecurrence Relation
Recurrence RelationNilaNila16
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.ioNilaNila16
 

More from NilaNila16 (14)

Basic Block Scheduling
Basic Block SchedulingBasic Block Scheduling
Basic Block Scheduling
 
Affine Array Indexes
Affine Array IndexesAffine Array Indexes
Affine Array Indexes
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
Web Programming
Web ProgrammingWeb Programming
Web Programming
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Operating system
Operating systemOperating system
Operating system
 
RDBMS
RDBMSRDBMS
RDBMS
 
Linear Block Codes
Linear Block CodesLinear Block Codes
Linear Block Codes
 
Applications of graph theory
                      Applications of graph theory                      Applications of graph theory
Applications of graph theory
 
Hasse Diagram
Hasse DiagramHasse Diagram
Hasse Diagram
 
Fuzzy set
Fuzzy set Fuzzy set
Fuzzy set
 
Recurrence Relation
Recurrence RelationRecurrence Relation
Recurrence Relation
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.io
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

Data Mining

  • 1. MINING FREQUENT PATTERNS, ASSOCIATION AND CORRELATIONS Data Mining… Nadar Saraswathi College Of Arts & Science Submitted by, Nagapandiyammal N
  • 2. W H AT I S F R E Q U E N T PAT T E R N A N A LY S I S ?  Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set.  First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining.  Motivation: Finding inherent regularities in data What products were often purchased together?— Beer and diapers?! What are the subsequent purchases after buying a PC? What kinds of DNA are sensitive to this new drug? Can we automatically classify web documents?
  • 3. I M P O RTA N T F R E Q U E N T PAT T E R N M I N I N G  Discloses an intrinsic and important property of data sets  Forms the foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential, structural (e.g., sub-graph) patterns Pattern analysis in spatiotemporal, multimedia, time series, and stream data Classification: associative classification Cluster analysis: frequent pattern-based clustering Data warehousing: iceberg cube and cube-gradient Semantic data compression: fascicles Broad applications
  • 4. C L O S E D PAT T E R N S A N D M A X - PAT T E R N S  A long pattern contains a combinatorial number of sub patterns, e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) =2100 – 1 = 1.27*1030 sub-patterns!  An item set X is closed if X is frequent and there exists no super-pattern Y ‫כ‬X, with the same support as X  An item set X is a max-pattern if X is frequent and there exists no frequent super-pattern Y ‫כ‬X  Closed pattern is a lossless compression of freq. patterns Reducing the # of patterns and rules
  • 5. SC A LA BLE M ETHOD S FOR M IN IN G FR EQU EN T PATTER N S  The downward closure property of frequent patterns Any subset of a frequent item set must be frequent If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} Scalable mining methods: Three major approaches  Apriori  Frequent pattern growth  Vertical data format approach
  • 6. A P R I O R I : A C A N D I D AT E G E N E R AT I O N A N D T E S T A P P R O A C H  Apriori pruning principle: If there is any item set which is infrequent, its superset should not be generated/tested.  Method: Initially, scan DB once to get frequent 1-itemset Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated
  • 7. THE APRIORI ALGORITHM  Pseudo-code: Ck: Candidate item set of size k Lk: frequent item set of size k L1= {frequent items}; for (k = 1; Lk !=Æ; k++) do begin Ck+1=candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1=candidates in Ck+1 with min support End return Èk Lk;
  • 8. IM PORTA N T D ETA ILS OF A PR IOR I  How to generate candidates? Step 1: self-joining Lk Step 2: pruning How to count supports of candidates? Example of Candidate-generation L3={abc, abd, acd, ace, bcd} Self-joining: L3*L3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L3 C4={abcd}
  • 9. CHALLENGES OF FREQUENT PATTERN MINING  Challenges  Multiple scans of transaction database  Huge number of candidates  Tedious workload of support counting for candidates  Improving Apriori: general ideas  Reduce passes of transaction database scans  Shrink number of candidates  Facilitate support counting of candidates
  • 10. SA M PLE FOR FR EQU EN T PATTER N S  Select a sample of original database, mine frequent  patterns within sample using Apriori  Scan database once to verify frequent itemsets found in sample, only borders of closure of frequent patterns are checked  Example: check abcd instead of ab, ac, …, etc.  Scan database again to find missed frequent patterns
  • 11. T H E E X A M P L E O F A P R I O R I A L G O R I T H M
  • 12. SUMMARY  Frequent pattern mining—an important task in data mining  Scalable frequent pattern mining methods Apriori (Candidate generation & test) Projection-based (FP growth, CLOSET+, ...) Vertical format approach (CHARM, ...)  Mining a variety of rules and interesting patterns  Constraint-based mining  Mining sequential and structured patterns  Extensions and applications