SlideShare a Scribd company logo
A Fast Method of Statistical Assessment
for Combinatorial Hypotheses
Based on Frequent Itemset Enumeration
Shin-ichi Minato, Takeaki Uno, Koji Tsuda, Aika Terada and Jun Sese
Selection reason: Looking for some hints for speeding up the proposing method
(PKDD2014)
Abstract (Introduction)
● Combinatorial hypothesis assessment is a hard problem
○ Large p-value correction factor due to multiple testing
● LAMP method was proposed to exclude meaningless hypotheses
○ Based on frequent itemset enumeration
○ Can find more accurate p-value correction
● However, original implementation is time-consuming
○ Itemset mining algorithm executed many times
● This work proposed a new, faster LAMP algorithm
○ Execute itemset mining algorithm only once
○ 10 to 100 times faster than original LAMP
Preliminary
●       be a set of items. An itemset is a subset of E.
● A transaction database D is a dataset that composed of transactions.
● An occurence of itemset X is a transaction including X
● Occurrence set Occ(X) is the set of all occurrences of X in D
● Frequency of X frq(X) is the number of occurrences of X in D
● An itemset X is called frequent for a constant if
●    is the number of itemset X that is frequent for sigma
Frequent itemsets for
(apple), (beer), (rice), (milk)
(apple, beer), (milk, beer), (beer, rice)
Frequent itemsets for
(apple), (beer), (rice), (milk)
(beer, rice)
Frequent Itemset Mining Algorithm
● TASK: Find all itemsets that are frequent for a constant
● Start from empty set, recursively add items with depth-first search
● is condition to prevent duplicated solution
○ is the maximum item in X
● The heaviest computation is the function
Frequent Itemset Mining Algorithm (Update)
● An item is addible for itemset X if
● Let the set of addible items for X that satisfy
● We can add these items without calling
Statistical Assessment for Combinatorial Hypotheses
● Assume a classifier classify each transaction
● itemsets, transactions, positive transactions
● For a itemset X
○ the number of transactions contains X ( )
○ is the number of positive transactions in
Please neglect the numbers.
p-value of Fisher’s exact test is calculated as
where
Statistical Assessment for Combinatorial Hypotheses
Multiple testing and LAMP’s idea
● Have to keep FWER
● LAMP ideas
○ Exclude meaninglessly infrequent itemsets which never be significant
○ Itemsets having completely the same occurence set can be counted as one
● For an itemset X, the p-value cannot be smaller than
- is monotonically decreasing
- If , all infrequent itemsets (to ) can never be significant
- Let be the number of all closed itemsets that
- LAMP find the maximum that satisfies
Multiple testing and LAMP’s idea
Current implement of LAMP
- Intuitive approach.
- Start from most frequent itemset (null itemset)
- Conduct breadth-first search for each lower frequent parameter sigma
- Large size of memory usage
Approach 1:
Approach 2: (actually implemented)
- Depth-first search approach that requires less memory
- Have to call LCM to compute repeatly
- Time consuming
Reforming the problem
● Reform the problem using a threshold function
○ :
○ is monotonically decreasing for x and increasing for y
○ we reform our problem to
■
■
○ And our problem is to find largest that satisfy
Support increasing algorithm
● This algorithm generate itemsets starting from small sigma
● First observation
○ For a frequency sigma, if we found some k that and
then
● Second observation
○ Assume that we are considering and found k itemsets that
we can skip and go to
○ Here, we can reuse the current k itemsets
we just need to remove the itemsets with frequency
Support increasing algorithm
● if is relative small compared to on average, algorithm terminates fast
● Maintain can be done by using a heap to extracts the minimum frequency itemset from S
that takes
● However for large or is very large, the algorithm take very long time
Faster implemention
● However, we don’t need to maintain the hold using heap
● We only need the size of , we can store only the size
○ This make the step of removing infrequent itemsets
○ Moreover, adding the addible items also only takes
Experiments
Conclusion
● Proposed a fast itemset enumeration algorithm to find the frequency threshold
satisfying the LAMP condition
● The proposed method is much faster than the original
● Future work:
○ It will be useful if we can efficiently compute the p-values for many combinatorial
hypotheses and can discover the best or top-k significant one (Our work)
○ Other tests such as X-squared test and Mann-Whitney test
○ Extension to non binary-valued database (Our work)
Comment:
- Solid work and gave such great insights about the current problem we are dealing with
- A bit surprise when reading the future work part

More Related Content

Similar to Introduction to FAST-LAMP

Complexity
ComplexityComplexity
Complexity
Malainine Zaid
 
Slides [DAA] Unit 2 Ch 2.pdf
Slides [DAA] Unit 2 Ch 2.pdfSlides [DAA] Unit 2 Ch 2.pdf
Slides [DAA] Unit 2 Ch 2.pdf
Vijayraj799513
 
Lec1.pptx
Lec1.pptxLec1.pptx
DAA Slides for Multiple topics such as different algorithms
DAA Slides for Multiple topics such as different algorithmsDAA Slides for Multiple topics such as different algorithms
DAA Slides for Multiple topics such as different algorithms
DEVARSHHIRENBHAIPARM
 
Unit ii algorithm
Unit   ii algorithmUnit   ii algorithm
Unit ii algorithm
Tribhuvan University
 
Lec 2 algorithms efficiency complexity
Lec 2 algorithms efficiency  complexityLec 2 algorithms efficiency  complexity
Lec 2 algorithms efficiency complexity
Anaya Zafar
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Olivier Teytaud
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
Afaq Mansoor Khan
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
Muhammad Hammad Waseem
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
AntareepMajumder
 
Lecture 4 asymptotic notations
Lecture 4   asymptotic notationsLecture 4   asymptotic notations
Lecture 4 asymptotic notations
jayavignesh86
 
Unit II_Searching and Sorting Algorithms.ppt
Unit II_Searching and Sorting Algorithms.pptUnit II_Searching and Sorting Algorithms.ppt
Unit II_Searching and Sorting Algorithms.ppt
HODElex
 
Algorithms
Algorithms Algorithms
Algorithms
yashodhaHR2
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth
 
DA lecture 3.pptx
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptx
SayanSen36
 
Algorithm Analysis.pdf
Algorithm Analysis.pdfAlgorithm Analysis.pdf
Algorithm Analysis.pdf
MemMem25
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
Syed Zaid Irshad
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
Olivier Teytaud
 
Unit 2 algorithm
Unit   2 algorithmUnit   2 algorithm
Unit 2 algorithm
Dabbal Singh Mahara
 

Similar to Introduction to FAST-LAMP (20)

Complexity
ComplexityComplexity
Complexity
 
Slides [DAA] Unit 2 Ch 2.pdf
Slides [DAA] Unit 2 Ch 2.pdfSlides [DAA] Unit 2 Ch 2.pdf
Slides [DAA] Unit 2 Ch 2.pdf
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
 
DAA Slides for Multiple topics such as different algorithms
DAA Slides for Multiple topics such as different algorithmsDAA Slides for Multiple topics such as different algorithms
DAA Slides for Multiple topics such as different algorithms
 
Unit ii algorithm
Unit   ii algorithmUnit   ii algorithm
Unit ii algorithm
 
Lec 2 algorithms efficiency complexity
Lec 2 algorithms efficiency  complexityLec 2 algorithms efficiency  complexity
Lec 2 algorithms efficiency complexity
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
Lecture 4 asymptotic notations
Lecture 4   asymptotic notationsLecture 4   asymptotic notations
Lecture 4 asymptotic notations
 
Unit II_Searching and Sorting Algorithms.ppt
Unit II_Searching and Sorting Algorithms.pptUnit II_Searching and Sorting Algorithms.ppt
Unit II_Searching and Sorting Algorithms.ppt
 
Algorithms
Algorithms Algorithms
Algorithms
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
DA lecture 3.pptx
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptx
 
Algorithm Analysis.pdf
Algorithm Analysis.pdfAlgorithm Analysis.pdf
Algorithm Analysis.pdf
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
 
Unit 2 algorithm
Unit   2 algorithmUnit   2 algorithm
Unit 2 algorithm
 

More from Thien Q. Tran

LLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak AttacksLLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak Attacks
Thien Q. Tran
 
Finding statistically significant interactions between continuous features (I...
Finding statistically significant interactions between continuous features (I...Finding statistically significant interactions between continuous features (I...
Finding statistically significant interactions between continuous features (I...
Thien Q. Tran
 
Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)
Thien Q. Tran
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Thien Q. Tran
 
Hypothesis testing and statistically sound-pattern mining
Hypothesis testing and statistically sound-pattern miningHypothesis testing and statistically sound-pattern mining
Hypothesis testing and statistically sound-pattern mining
Thien Q. Tran
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Thien Q. Tran
 

More from Thien Q. Tran (6)

LLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak AttacksLLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak Attacks
 
Finding statistically significant interactions between continuous features (I...
Finding statistically significant interactions between continuous features (I...Finding statistically significant interactions between continuous features (I...
Finding statistically significant interactions between continuous features (I...
 
Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
 
Hypothesis testing and statistically sound-pattern mining
Hypothesis testing and statistically sound-pattern miningHypothesis testing and statistically sound-pattern mining
Hypothesis testing and statistically sound-pattern mining
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 

Recently uploaded

MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 

Recently uploaded (20)

MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 

Introduction to FAST-LAMP

  • 1. A Fast Method of Statistical Assessment for Combinatorial Hypotheses Based on Frequent Itemset Enumeration Shin-ichi Minato, Takeaki Uno, Koji Tsuda, Aika Terada and Jun Sese Selection reason: Looking for some hints for speeding up the proposing method (PKDD2014)
  • 2. Abstract (Introduction) ● Combinatorial hypothesis assessment is a hard problem ○ Large p-value correction factor due to multiple testing ● LAMP method was proposed to exclude meaningless hypotheses ○ Based on frequent itemset enumeration ○ Can find more accurate p-value correction ● However, original implementation is time-consuming ○ Itemset mining algorithm executed many times ● This work proposed a new, faster LAMP algorithm ○ Execute itemset mining algorithm only once ○ 10 to 100 times faster than original LAMP
  • 3. Preliminary ●       be a set of items. An itemset is a subset of E. ● A transaction database D is a dataset that composed of transactions. ● An occurence of itemset X is a transaction including X ● Occurrence set Occ(X) is the set of all occurrences of X in D ● Frequency of X frq(X) is the number of occurrences of X in D ● An itemset X is called frequent for a constant if ●    is the number of itemset X that is frequent for sigma Frequent itemsets for (apple), (beer), (rice), (milk) (apple, beer), (milk, beer), (beer, rice) Frequent itemsets for (apple), (beer), (rice), (milk) (beer, rice)
  • 4. Frequent Itemset Mining Algorithm ● TASK: Find all itemsets that are frequent for a constant ● Start from empty set, recursively add items with depth-first search ● is condition to prevent duplicated solution ○ is the maximum item in X ● The heaviest computation is the function
  • 5. Frequent Itemset Mining Algorithm (Update) ● An item is addible for itemset X if ● Let the set of addible items for X that satisfy ● We can add these items without calling
  • 6. Statistical Assessment for Combinatorial Hypotheses ● Assume a classifier classify each transaction ● itemsets, transactions, positive transactions ● For a itemset X ○ the number of transactions contains X ( ) ○ is the number of positive transactions in Please neglect the numbers. p-value of Fisher’s exact test is calculated as where
  • 7. Statistical Assessment for Combinatorial Hypotheses
  • 8. Multiple testing and LAMP’s idea ● Have to keep FWER ● LAMP ideas ○ Exclude meaninglessly infrequent itemsets which never be significant ○ Itemsets having completely the same occurence set can be counted as one ● For an itemset X, the p-value cannot be smaller than - is monotonically decreasing - If , all infrequent itemsets (to ) can never be significant - Let be the number of all closed itemsets that - LAMP find the maximum that satisfies
  • 9. Multiple testing and LAMP’s idea
  • 10. Current implement of LAMP - Intuitive approach. - Start from most frequent itemset (null itemset) - Conduct breadth-first search for each lower frequent parameter sigma - Large size of memory usage Approach 1: Approach 2: (actually implemented) - Depth-first search approach that requires less memory - Have to call LCM to compute repeatly - Time consuming
  • 11. Reforming the problem ● Reform the problem using a threshold function ○ : ○ is monotonically decreasing for x and increasing for y ○ we reform our problem to ■ ■ ○ And our problem is to find largest that satisfy
  • 12. Support increasing algorithm ● This algorithm generate itemsets starting from small sigma ● First observation ○ For a frequency sigma, if we found some k that and then ● Second observation ○ Assume that we are considering and found k itemsets that we can skip and go to ○ Here, we can reuse the current k itemsets we just need to remove the itemsets with frequency
  • 13. Support increasing algorithm ● if is relative small compared to on average, algorithm terminates fast ● Maintain can be done by using a heap to extracts the minimum frequency itemset from S that takes ● However for large or is very large, the algorithm take very long time
  • 14. Faster implemention ● However, we don’t need to maintain the hold using heap ● We only need the size of , we can store only the size ○ This make the step of removing infrequent itemsets ○ Moreover, adding the addible items also only takes
  • 16. Conclusion ● Proposed a fast itemset enumeration algorithm to find the frequency threshold satisfying the LAMP condition ● The proposed method is much faster than the original ● Future work: ○ It will be useful if we can efficiently compute the p-values for many combinatorial hypotheses and can discover the best or top-k significant one (Our work) ○ Other tests such as X-squared test and Mann-Whitney test ○ Extension to non binary-valued database (Our work) Comment: - Solid work and gave such great insights about the current problem we are dealing with - A bit surprise when reading the future work part