SlideShare a Scribd company logo
1 of 80
Download to read offline
Pattern Mining: Getting the
most out of your log data.
Krishna Sridhar
Staff Data Scientist, Dato Inc.
krishna_srd
• Background
- Machine Learning (ML) Research.
- Ph.D Numerical Optimization @Wisconsin
• Now
- Build ML tools for data-scientists & developers @Dato.
- Help deploy ML algorithms.
@krishna_srd, @DatoInc
About Me!
45+$and$growing$fast!
About Us!
+ =
Questions?
• (Now) I love questions. Feel free to interrupt for questions!
• (Later) Email me srikris@dato.com.
DAML Talks!
About you?
Creating a model pipeline
Ingest Transform Model Deploy
Unstructured Data
exploration
data
modeling
Data Science Workflow
Ingest Transform Model Deploy
Log Journey
Lots of data
Insights Profits
Log Mining: Pattern Mining
Logs are everywhere!
Machine Learning in Logs
Source: Mining Your Logs - Gaining Insight Through Visualization
Coffee shop
Coffee Shops Menu
Receipts
Coffee Shops Menu
Coffee Store Logs
Frequent Pattern Mining
What sets of items were bought together?
Real Applications
Real Applications
Real Applications
Log Mining: Rule Mining
Can we recommend items?
Rule Mining
Real Applications
Log Mining: Feature Extraction
Feature Extraction
0 1 0 0 0 0 1 1 0
1 1 0 0 1 0 0 0 0
0 0 1 1 1 0
Receipt Space Features in
Menu Space
ML
3 Useful Data Mining Tasks
Rule MiningPattern Mining Feature Extraction
Demo
Pattern Mining: Explained
Formulating Pattern Mining
N distinct items → 2N itemsets
Formulating Pattern Mining
Find the top K most frequent sets of length at least L
that occur at least M times.
Formulating Pattern Mining
Find the top K most frequent sets of length at least L
that occur at least M times.
- max_patterns
- min_length
- min_support
Pattern Mining
N distinct items → 2N itemsets
Pattern Mining: Principles
Pattern Mining: Principles
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Is the pattern {C, D} frequent?
M = 4
Patterns
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{C, D} occurs 5 times
M = 4
Patterns
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
M = 4
Patterns
{C, D} occurs 5 times
Frequent!
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Is the pattern {A, D} frequent?
M = 4
Patterns
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
M = 4
Patterns
{A, D} occurs 3 times.
Not frequent.
Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{C, D}: 5 is frequent
M = 4
{A, D}: 3 is not frequent
min_support
Principle 2: Apriori principle
A pattern is frequent only if a subset is frequent
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{B, C, D} : 4 is frequent therefore
{C, D} : 5 is frequent
M = 4
Principle 2: Apriori principle
A pattern is frequent only if a subset is frequent
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{B, C, D} : 4 is frequent therefore
{C, D} : 5 is frequent
M = 4
Why?
{C, D} must occur at least as
many times as {B, C, D}.
Principle 2: Apriori principle (Contrapositive)
If a pattern is not frequent then all supersets are not frequent
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
M = 4
{A} : 3 is not frequent therefore
{A, D} : 3 is not frequent
Principle 2: Apriori principle (Contrapositive)
If a pattern is not frequent then all supersets are not frequent
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
M = 4
{A} : 3 is not frequent therefore
{A, D} : 3 is not frequent
Why?
{A, D} cannot occur more times
than {A}.
Two Main Algorithms
• Candidate Generation
- Apriori
- Eclat
• Pattern Growth
- FP-Growth
- TopK FP-Growth
Candidate Generation
Lots of Generalizations
Source: http://www.philippe-fournier-viger.com/spmf/
Candidate Generation
Two phases
1. Candidate generation.
2. Candidate filtering.
Exploit Apriori Principle!
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : ? {B} : ? {C} : ? {D} : ?
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : ? {B} : ? {C} : ? {D} : ?
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Not frequent
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
No need to
explore!
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Candidate Generation
{AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5
{A} : 3 {B} : 4 {C} : 5 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
Two Main Algorithms
• Candidate Generation
- Apriori
- Eclat
• Pattern Growth
- FP-Growth
- TopK FP-Growth
Candidate Generation
Two phases
1. Candidate generation: Enumerate all subsets.
2. Candidate filtering: Eliminate infrequent subsets.
Exploit Apriori Principle!
Pattern Growth
Pattern Growth
Two phases
1. Candidate filtering
2. Conditional database constructions.
Avoid full scans over the data & large
candidate sets!
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : 1 {AC} : 2 {AD} : 3 {BD} : 4 {CD} : 4
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : 0 {ABD} : 1 {ACD} : 2 {BCD} : 2
{BC} : 2
Pattern Growth - Preprocessing {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
Preprocessing
First, count the number of times
each item (singleton) occurs.
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : ?
No need to
explore!
Pattern Growth - Depth First {B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{AB} : X {AC} : ? {AD} : ? {BD} : X {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : X
Explore depth
first on {B}
Pattern Growth
{B} : 4
{ } : 6
Conditional Database Construction
DB{} DB{B}
{B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
{C, D}
{D}
{C, D}
{D}
Pattern Growth
{B} : 4
{ } : 6
Candidate Filtering
DB{B}
{C, D}
{D}
{C, D}
{D}
{D} : 4
{C} : 2
DB{}
{B, C, D}
{A, C, D}
{B, D}
{A, C, D}
{B, C, D}
{A, B, D}
DB{B}
Add {BD} as frequent
Pattern Growth - Depth First {C, D}
{D}
{C, D}
{D}
{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?
{BC} : 2
Explore depth
first on {BD}
Pattern Growth - Depth First
{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?
{A} : 3 {B} : 4 {C} : 4 {D} : 6
{ } : 6
{ABC} : ? {ABD} : X {ACD} : ? {BCD} : X
{BC} : 2
Compare & Constrast
• Candidate Generation
+ Better than brute force
+ Filters candidate sets
- Multiple passes over the data
• Pattern Growth
+ Fewer passes over the data
+ Space efficient.
Compare & Constrast
• Candidate Generation
+ Better than brute force
+ Filters candidate sets
- Multiple passes over the data
• Pattern Growth
+ Fewer passes over the data
+ Space efficient.
Better choice
Some cool ideas
FP-Tree Compression
Figures From Florian Verhein’s Slides on FP-Growth
FP-Growth Algorithm
Figures From Florian Verhein’s Slides on FP-Growth
Two phases
1. Candidate filtering.
2. Conditional database constructions.
TopK FP-Growth Algorithm
Similar to FP-Growth
1. Dynamically raise min_support.
2. Estimates of min_support greatly help.
Future Work
Distributed FP-Growth
Partition database on item-ids.
Database
Bags + Sequences
× 2
Itemset: {Item}
Bags: {Item: quantity}
Sequences : (item)
Demo: Model built, now what?
Summary
Log Data Mining
≠
Rocket Science
• FP-Growth for finding frequent patterns.
• Find rules from patterns to make predictions.
• Extract features for useful ML in pattern space.
SELECT questions FROM audience
WHERE difficulty == “Easy”
Thanks!

More Related Content

What's hot

Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
1.11.association mining 3
1.11.association mining 31.11.association mining 3
1.11.association mining 3Krish_ver2
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1Krish_ver2
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rulesGautam Thakur
 
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureIOSR Journals
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19ngamou
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2Krish_ver2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 

What's hot (20)

Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
1.11.association mining 3
1.11.association mining 31.11.association mining 3
1.11.association mining 3
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
 
05
0505
05
 
Ej36829834
Ej36829834Ej36829834
Ej36829834
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Associative Learning
Associative LearningAssociative Learning
Associative Learning
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
 
B0950814
B0950814B0950814
B0950814
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
 
Rmining
RminingRmining
Rmining
 
J0945761
J0945761J0945761
J0945761
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
 
Ad03301810188
Ad03301810188Ad03301810188
Ad03301810188
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 

Viewers also liked

Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern MiningPrakhar Dhama
 
Efficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemEfficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemSaurav Kumar
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesKasun Gajasinghe
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
 
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...ijsrd.com
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
 
Frequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataFrequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataRaju Gupta
 
A vertical representation in frequent item set mining
A vertical representation in frequent item set miningA vertical representation in frequent item set mining
A vertical representation in frequent item set miningDr.Manmohan Singh
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationFlorian Leitner
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...Victor Giannakouris
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern miningkiran said
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorialmgrcar
 

Viewers also liked (20)

Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern Mining
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Efficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemEfficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed system
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Frequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataFrequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigData
 
A vertical representation in frequent item set mining
A vertical representation in frequent item set miningA vertical representation in frequent item set mining
A vertical representation in frequent item set mining
 
IntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotationsIntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotations
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text Classification
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Spatial Data Model
Spatial Data ModelSpatial Data Model
Spatial Data Model
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 
Text and text stream mining tutorial
Text and text stream mining tutorialText and text stream mining tutorial
Text and text stream mining tutorial
 

Similar to Frequent Pattern Mining - Krishna Sridhar, Feb 2016

Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataTuri, Inc.
 
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16MLconf
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesMax De Marzi
 
Massively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMassively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMehdi Zitouni
 
Advertising
AdvertisingAdvertising
Advertisingblack150
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesMax De Marzi
 
Realizability Analysis for Message-based Interactions Using Shared-State Proj...
Realizability Analysis for Message-based Interactions Using Shared-State Proj...Realizability Analysis for Message-based Interactions Using Shared-State Proj...
Realizability Analysis for Message-based Interactions Using Shared-State Proj...Sylvain Hallé
 
Module 5 Indices PMR
Module 5 Indices PMRModule 5 Indices PMR
Module 5 Indices PMRroszelan
 
Consistency without Consensus: CRDTs in Production at SoundCloud
Consistency without Consensus: CRDTs in Production at SoundCloudConsistency without Consensus: CRDTs in Production at SoundCloud
Consistency without Consensus: CRDTs in Production at SoundCloudC4Media
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...butest
 
Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007Demetrio Ccesa Rayme
 
Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007Demetrio Ccesa Rayme
 
Optimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansOptimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansDatabricks
 
shake! 2017 본선문제 풀이
shake! 2017 본선문제 풀이shake! 2017 본선문제 풀이
shake! 2017 본선문제 풀이HYUNJEONG KIM
 
ARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptChellamuthuHaripriya
 

Similar to Frequent Pattern Mining - Krishna Sridhar, Feb 2016 (20)

Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph Databases
 
Massively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMassively distributed environments and closed itemset mining
Massively distributed environments and closed itemset mining
 
Advertising
AdvertisingAdvertising
Advertising
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker Notes
 
Realizability Analysis for Message-based Interactions Using Shared-State Proj...
Realizability Analysis for Message-based Interactions Using Shared-State Proj...Realizability Analysis for Message-based Interactions Using Shared-State Proj...
Realizability Analysis for Message-based Interactions Using Shared-State Proj...
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
 
Module 5 Indices PMR
Module 5 Indices PMRModule 5 Indices PMR
Module 5 Indices PMR
 
Consistency without Consensus: CRDTs in Production at SoundCloud
Consistency without Consensus: CRDTs in Production at SoundCloudConsistency without Consensus: CRDTs in Production at SoundCloud
Consistency without Consensus: CRDTs in Production at SoundCloud
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
 
Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007
 
6asso
6asso6asso
6asso
 
Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007Teoria y problemas de fracciones fr122 ccesa007
Teoria y problemas de fracciones fr122 ccesa007
 
Optimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansOptimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex Plans
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Ratio and proportion
Ratio and proportionRatio and proportion
Ratio and proportion
 
shake! 2017 본선문제 풀이
shake! 2017 본선문제 풀이shake! 2017 본선문제 풀이
shake! 2017 본선문제 풀이
 
Amazon DynamoDB Workshop
Amazon DynamoDB WorkshopAmazon DynamoDB Workshop
Amazon DynamoDB Workshop
 
ARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .ppt
 

More from Seattle DAML meetup

Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Seattle DAML meetup
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Seattle DAML meetup
 
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Seattle DAML meetup
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Seattle DAML meetup
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Seattle DAML meetup
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Seattle DAML meetup
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Seattle DAML meetup
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015Seattle DAML meetup
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Seattle DAML meetup
 

More from Seattle DAML meetup (11)

Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...
 
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015
 

Recently uploaded

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...gragchanchal546
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementDr. Deepak Mudgal
 

Recently uploaded (20)

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 

Frequent Pattern Mining - Krishna Sridhar, Feb 2016

  • 1. Pattern Mining: Getting the most out of your log data. Krishna Sridhar Staff Data Scientist, Dato Inc. krishna_srd
  • 2. • Background - Machine Learning (ML) Research. - Ph.D Numerical Optimization @Wisconsin • Now - Build ML tools for data-scientists & developers @Dato. - Help deploy ML algorithms. @krishna_srd, @DatoInc About Me!
  • 4. + = Questions? • (Now) I love questions. Feel free to interrupt for questions! • (Later) Email me srikris@dato.com. DAML Talks!
  • 6. Creating a model pipeline Ingest Transform Model Deploy Unstructured Data exploration data modeling Data Science Workflow Ingest Transform Model Deploy
  • 7. Log Journey Lots of data Insights Profits
  • 10. Machine Learning in Logs Source: Mining Your Logs - Gaining Insight Through Visualization
  • 14. Frequent Pattern Mining What sets of items were bought together?
  • 19. Can we recommend items? Rule Mining
  • 21. Log Mining: Feature Extraction
  • 22. Feature Extraction 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 0 Receipt Space Features in Menu Space ML
  • 23. 3 Useful Data Mining Tasks Rule MiningPattern Mining Feature Extraction
  • 24. Demo
  • 26. Formulating Pattern Mining N distinct items → 2N itemsets
  • 27. Formulating Pattern Mining Find the top K most frequent sets of length at least L that occur at least M times.
  • 28. Formulating Pattern Mining Find the top K most frequent sets of length at least L that occur at least M times. - max_patterns - min_length - min_support
  • 29. Pattern Mining N distinct items → 2N itemsets
  • 32. Principle 1: What is frequent? A pattern is frequent if it occurs at least M times. {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} Is the pattern {C, D} frequent? M = 4 Patterns
  • 33. Principle 1: What is frequent? A pattern is frequent if it occurs at least M times. {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} {C, D} occurs 5 times M = 4 Patterns
  • 34. Principle 1: What is frequent? A pattern is frequent if it occurs at least M times. {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} M = 4 Patterns {C, D} occurs 5 times Frequent!
  • 35. Principle 1: What is frequent? A pattern is frequent if it occurs at least M times. {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} Is the pattern {A, D} frequent? M = 4 Patterns
  • 36. Principle 1: What is frequent? A pattern is frequent if it occurs at least M times. {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} M = 4 Patterns {A, D} occurs 3 times. Not frequent.
  • 37. Principle 1: What is frequent? A pattern is frequent if it occurs at least M times. {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} {C, D}: 5 is frequent M = 4 {A, D}: 3 is not frequent min_support
  • 38. Principle 2: Apriori principle A pattern is frequent only if a subset is frequent {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} {B, C, D} : 4 is frequent therefore {C, D} : 5 is frequent M = 4
  • 39. Principle 2: Apriori principle A pattern is frequent only if a subset is frequent {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} {B, C, D} : 4 is frequent therefore {C, D} : 5 is frequent M = 4 Why? {C, D} must occur at least as many times as {B, C, D}.
  • 40. Principle 2: Apriori principle (Contrapositive) If a pattern is not frequent then all supersets are not frequent {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} M = 4 {A} : 3 is not frequent therefore {A, D} : 3 is not frequent
  • 41. Principle 2: Apriori principle (Contrapositive) If a pattern is not frequent then all supersets are not frequent {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} M = 4 {A} : 3 is not frequent therefore {A, D} : 3 is not frequent Why? {A, D} cannot occur more times than {A}.
  • 42. Two Main Algorithms • Candidate Generation - Apriori - Eclat • Pattern Growth - FP-Growth - TopK FP-Growth
  • 44. Lots of Generalizations Source: http://www.philippe-fournier-viger.com/spmf/
  • 45. Candidate Generation Two phases 1. Candidate generation. 2. Candidate filtering. Exploit Apriori Principle!
  • 46. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ? {A} : ? {B} : ? {C} : ? {D} : ? { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D}
  • 47. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ? {A} : ? {B} : ? {C} : ? {D} : ? { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D}
  • 48. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 5 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D}
  • 49. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 5 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} Not frequent
  • 50. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 5 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D} No need to explore!
  • 51. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 5 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D}
  • 52. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5 {A} : 3 {B} : 4 {C} : 5 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D}
  • 53. Candidate Generation {AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5 {A} : 3 {B} : 4 {C} : 5 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {B, C, D} {A, C, D} {A, B, C, D} {A, D} {B, C, D} {B, C, D}
  • 54. Two Main Algorithms • Candidate Generation - Apriori - Eclat • Pattern Growth - FP-Growth - TopK FP-Growth
  • 55. Candidate Generation Two phases 1. Candidate generation: Enumerate all subsets. 2. Candidate filtering: Eliminate infrequent subsets. Exploit Apriori Principle!
  • 57. Pattern Growth Two phases 1. Candidate filtering 2. Conditional database constructions. Avoid full scans over the data & large candidate sets!
  • 58. Pattern Growth - Depth First {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {AB} : 1 {AC} : 2 {AD} : 3 {BD} : 4 {CD} : 4 {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : 0 {ABD} : 1 {ACD} : 2 {BCD} : 2 {BC} : 2
  • 59. Pattern Growth - Preprocessing {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 Preprocessing First, count the number of times each item (singleton) occurs.
  • 60. Pattern Growth - Depth First {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {BC} : ?
  • 61. Pattern Growth - Depth First {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {BC} : ?
  • 62. Pattern Growth - Depth First {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {BC} : ?
  • 63. Pattern Growth - Depth First {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {BC} : ? No need to explore!
  • 64. Pattern Growth - Depth First {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {AB} : X {AC} : ? {AD} : ? {BD} : X {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {BC} : X Explore depth first on {B}
  • 65. Pattern Growth {B} : 4 { } : 6 Conditional Database Construction DB{} DB{B} {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} {C, D} {D} {C, D} {D}
  • 66. Pattern Growth {B} : 4 { } : 6 Candidate Filtering DB{B} {C, D} {D} {C, D} {D} {D} : 4 {C} : 2 DB{} {B, C, D} {A, C, D} {B, D} {A, C, D} {B, C, D} {A, B, D} DB{B} Add {BD} as frequent
  • 67. Pattern Growth - Depth First {C, D} {D} {C, D} {D} {AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ? {BC} : 2 Explore depth first on {BD}
  • 68. Pattern Growth - Depth First {AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ? {A} : 3 {B} : 4 {C} : 4 {D} : 6 { } : 6 {ABC} : ? {ABD} : X {ACD} : ? {BCD} : X {BC} : 2
  • 69. Compare & Constrast • Candidate Generation + Better than brute force + Filters candidate sets - Multiple passes over the data • Pattern Growth + Fewer passes over the data + Space efficient.
  • 70. Compare & Constrast • Candidate Generation + Better than brute force + Filters candidate sets - Multiple passes over the data • Pattern Growth + Fewer passes over the data + Space efficient. Better choice
  • 72. FP-Tree Compression Figures From Florian Verhein’s Slides on FP-Growth
  • 73. FP-Growth Algorithm Figures From Florian Verhein’s Slides on FP-Growth Two phases 1. Candidate filtering. 2. Conditional database constructions.
  • 74. TopK FP-Growth Algorithm Similar to FP-Growth 1. Dynamically raise min_support. 2. Estimates of min_support greatly help.
  • 77. Bags + Sequences × 2 Itemset: {Item} Bags: {Item: quantity} Sequences : (item)
  • 78. Demo: Model built, now what?
  • 79. Summary Log Data Mining ≠ Rocket Science • FP-Growth for finding frequent patterns. • Find rules from patterns to make predictions. • Extract features for useful ML in pattern space.
  • 80. SELECT questions FROM audience WHERE difficulty == “Easy” Thanks!