SlideShare a Scribd company logo
Efficient Algorithms for Association Finding and
Frequent Association Pattern Mining
Gong Cheng, Daxin Liu, Yuzhong Qu
Websoft Research Group
National Key Laboratory for Novel Software Technology
Nanjing University, China
Websoft
Background and motivation
• To suggest friends, recognize suspected terrorists,
answer questions … based on massive graph data
Problem statement
An association connecting a set of query entities is
a minimal subgraph that
• contains all the query entities, and
• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Problem statement
An association connecting a set of query entities is
a minimal subgraph that
• contains all the query entities, and
• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Problem statement
An association connecting a set of query entities is
a minimal subgraph that
• contains all the query entities, and
• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Problem statement
An association connecting a set of query entities is
a minimal subgraph that
• contains all the query entities, and
• is connected.
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
• Tree-structured
• Leaves ⊆ Query entities
Problem statement
1. How to efficiently find associations in a possibly
very large graph?
2. How to help users explore a possibly large set of
associations that have been found?
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Problem statement
1. How to efficiently find associations in a possibly
very large graph?
2. How to help users explore a possibly large set of
associations that have been found?
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Association finding
Frequent association
pattern mining
Association finding: Problem
• To find all the associations having a limited diameter
(Diameter = Greatest distance between any pair of vertices)
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Association finding: Basic solution
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Chris
attended
Paper-AisAuthorOf
acceptedAt
ISWC
Alice
Bob
reviewer
ISWC
ISWC
An association  A set of paths
Association finding: Basic solution
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Chris
attended
Paper-AisAuthorOf
acceptedAt
ISWC
Alice
Bob
reviewer
ISWC
ISWC
1. Path finding
2. Path merging
Association finding: Optimization
• Distance-based search space pruning
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
• DiameterConstraint ≤ 3
• Length(AliceDan) = 1
• Distance(Dan, Bob) = 4
Length+Distance > DiameterConstraint
Association finding: Optimization
• Distance computation
• Materializing offline computed results: O(V2) space
• Online computing: O(E) time per pair
• Using distance oracle: a space-time trade-off
Chris
Bob
Paper-A
Paper-B
Dan
isAuthorOf
knows
correspondingAuthor
acceptedAt
Ellen
knows
ISWC
isAuthorOf
COLD
acceptedAt
attended
attended
attended
reviewer
reviewer
Frank
knows
Alice
Association finding: Deduplication
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Chris
attended
Paper-AisAuthorOf
acceptedAt
ISWC
Alice
Bob
reviewer
ISWC
ISWC
An association 
A set of paths
Association finding: Deduplication
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Chris
attended
Paper-AisAuthorOf
Alice
Bob
reviewer
ISWC
ISWC
A duplicate association 
Another set of paths
Paper-A
acceptedAt
Paper-A
acceptedAt
Association finding: Deduplication
• Canonical code
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper-A))$
= … Paper-A,acceptedAt,code(Tree(ISWC))$ …
= … ISWC,reviewer,code(Tree(Bob)),~attended,code(Tree(Chris))$ …
= … Bob$ …
(Assuming Bob precedes Chris)
Frequent association pattern mining
• Association pattern: A conceptual abstract that
summarizes a group of associations
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Association  Association pattern
(Intermediate entity  Class)
Chris
Bob
PaperisAuthorOf
acceptedAt
Conference attended
reviewer
Alice
Frequent association pattern mining
• Problem: To mine all the association patterns matched by
more than a threshold proportion of associations
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Association  Association pattern
(Intermediate entity  Class)
Chris
Bob
PaperisAuthorOf
acceptedAt
Conference attended
reviewer
Alice
Frequent association pattern mining
• Basic solution:
Chris
Bob
Paper-AisAuthorOf
acceptedAt
ISWC attended
reviewer
Alice
Association  Association pattern
(Intermediate entity  Class)
Chris
Bob
PaperisAuthorOf
acceptedAt
Conference attended
reviewer
Alice
Calculating the frequency of an association pattern
= Counting the occurrence of its canonical code
Frequent association pattern mining
• Canonical code
Chris
Bob
PaperisAuthorOf
acceptedAt
Conference
attended
reviewer
Alice Paper
isAuthorOf acceptedAt
Conference
code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper)),isAuthorOf,code(Tree(Paper))$
Paper equals Paper.
So which subtree should go first?
(Canonical code may not be unique!)?
?
Frequent association pattern mining
• Canonical code
Chris
Bob
PaperisAuthorOf
acceptedAt
Conference
attended
reviewer
Alice Paper
isAuthorOf acceptedAt
Conference
code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper)),isAuthorOf,code(Tree(Paper))$
Smallest leaf as its proxy to be compared
Equality would never happen.
(Canonical code is now unique!)
(Assuming Bob precedes Chris)
Experiments
• Datasets
• LinkedMDB: 1M vertices and 2M arcs
• DBpedia (2015-04): 4M vertices and 15M arcs
• Parameter settings
• Diameter constraint (λ): 2, 4
• Number of query entities (n): 2, 3, 4, 5
• Test queries
• 1,000 random sets of query entities under each setting of λ and n
• Hardware configuration
• 3.3GHz CPU, 24GB memory
• Data graphs: in memory
• Distance oracles: on disk
Experiments
• Results: Association finding
BSC: Basic solution (not pruning)
PRN: Optimized solution (distance-based pruning)
PRN-1: Optimized solution (distance-based pruning except for the last level of search)
Experiments
• Results: Frequent association pattern mining
• LinkedMDB
• <10,000 associations: <21ms
• 13,531 associations: 68ms
• DBpedia
• <10,000 associations: <65ms
• 1,198,968 associations: 2909ms
Takeaway messages
• Subgraph finding and mining are faster than what we expected.
• Consider distance oracle and canonical code in your own research.
Takeaway messages
• Subgraph finding and mining are faster than what we expected.
• Consider distance oracle and canonical code in your own research.

More Related Content

More from Gong Cheng

Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
Gong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
Gong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
Gong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
Gong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
Gong Cheng
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
Gong Cheng
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
Gong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Gong Cheng
 
知识的摘要
知识的摘要知识的摘要
知识的摘要
Gong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Gong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Gong Cheng
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
Gong Cheng
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
Gong Cheng
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
Gong Cheng
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
Gong Cheng
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
Gong Cheng
 
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationRELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
Gong Cheng
 
Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
Gong Cheng
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
Gong Cheng
 

More from Gong Cheng (20)

Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
知识的摘要
知识的摘要知识的摘要
知识的摘要
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
 
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationRELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
 
Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 

Recently uploaded

Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
RDhivya6
 

Recently uploaded (20)

Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
 

Efficient Algorithms for Association Finding and Frequent Association Pattern Mining

  • 1. Efficient Algorithms for Association Finding and Frequent Association Pattern Mining Gong Cheng, Daxin Liu, Yuzhong Qu Websoft Research Group National Key Laboratory for Novel Software Technology Nanjing University, China Websoft
  • 2. Background and motivation • To suggest friends, recognize suspected terrorists, answer questions … based on massive graph data
  • 3. Problem statement An association connecting a set of query entities is a minimal subgraph that • contains all the query entities, and • is connected. Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice
  • 4. Problem statement An association connecting a set of query entities is a minimal subgraph that • contains all the query entities, and • is connected. Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice
  • 5. Problem statement An association connecting a set of query entities is a minimal subgraph that • contains all the query entities, and • is connected. Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice
  • 6. Problem statement An association connecting a set of query entities is a minimal subgraph that • contains all the query entities, and • is connected. Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice • Tree-structured • Leaves ⊆ Query entities
  • 7. Problem statement 1. How to efficiently find associations in a possibly very large graph? 2. How to help users explore a possibly large set of associations that have been found? Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice
  • 8. Problem statement 1. How to efficiently find associations in a possibly very large graph? 2. How to help users explore a possibly large set of associations that have been found? Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice Association finding Frequent association pattern mining
  • 9. Association finding: Problem • To find all the associations having a limited diameter (Diameter = Greatest distance between any pair of vertices) Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice
  • 10. Association finding: Basic solution Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Chris attended Paper-AisAuthorOf acceptedAt ISWC Alice Bob reviewer ISWC ISWC An association  A set of paths
  • 11. Association finding: Basic solution Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Chris attended Paper-AisAuthorOf acceptedAt ISWC Alice Bob reviewer ISWC ISWC 1. Path finding 2. Path merging
  • 12. Association finding: Optimization • Distance-based search space pruning Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice • DiameterConstraint ≤ 3 • Length(AliceDan) = 1 • Distance(Dan, Bob) = 4 Length+Distance > DiameterConstraint
  • 13. Association finding: Optimization • Distance computation • Materializing offline computed results: O(V2) space • Online computing: O(E) time per pair • Using distance oracle: a space-time trade-off Chris Bob Paper-A Paper-B Dan isAuthorOf knows correspondingAuthor acceptedAt Ellen knows ISWC isAuthorOf COLD acceptedAt attended attended attended reviewer reviewer Frank knows Alice
  • 14. Association finding: Deduplication Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Chris attended Paper-AisAuthorOf acceptedAt ISWC Alice Bob reviewer ISWC ISWC An association  A set of paths
  • 15. Association finding: Deduplication Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Chris attended Paper-AisAuthorOf Alice Bob reviewer ISWC ISWC A duplicate association  Another set of paths Paper-A acceptedAt Paper-A acceptedAt
  • 16. Association finding: Deduplication • Canonical code Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper-A))$ = … Paper-A,acceptedAt,code(Tree(ISWC))$ … = … ISWC,reviewer,code(Tree(Bob)),~attended,code(Tree(Chris))$ … = … Bob$ … (Assuming Bob precedes Chris)
  • 17. Frequent association pattern mining • Association pattern: A conceptual abstract that summarizes a group of associations Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Association  Association pattern (Intermediate entity  Class) Chris Bob PaperisAuthorOf acceptedAt Conference attended reviewer Alice
  • 18. Frequent association pattern mining • Problem: To mine all the association patterns matched by more than a threshold proportion of associations Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Association  Association pattern (Intermediate entity  Class) Chris Bob PaperisAuthorOf acceptedAt Conference attended reviewer Alice
  • 19. Frequent association pattern mining • Basic solution: Chris Bob Paper-AisAuthorOf acceptedAt ISWC attended reviewer Alice Association  Association pattern (Intermediate entity  Class) Chris Bob PaperisAuthorOf acceptedAt Conference attended reviewer Alice Calculating the frequency of an association pattern = Counting the occurrence of its canonical code
  • 20. Frequent association pattern mining • Canonical code Chris Bob PaperisAuthorOf acceptedAt Conference attended reviewer Alice Paper isAuthorOf acceptedAt Conference code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper)),isAuthorOf,code(Tree(Paper))$ Paper equals Paper. So which subtree should go first? (Canonical code may not be unique!)? ?
  • 21. Frequent association pattern mining • Canonical code Chris Bob PaperisAuthorOf acceptedAt Conference attended reviewer Alice Paper isAuthorOf acceptedAt Conference code(Tree(Alice)) = Alice,isAuthorOf,code(Tree(Paper)),isAuthorOf,code(Tree(Paper))$ Smallest leaf as its proxy to be compared Equality would never happen. (Canonical code is now unique!) (Assuming Bob precedes Chris)
  • 22. Experiments • Datasets • LinkedMDB: 1M vertices and 2M arcs • DBpedia (2015-04): 4M vertices and 15M arcs • Parameter settings • Diameter constraint (λ): 2, 4 • Number of query entities (n): 2, 3, 4, 5 • Test queries • 1,000 random sets of query entities under each setting of λ and n • Hardware configuration • 3.3GHz CPU, 24GB memory • Data graphs: in memory • Distance oracles: on disk
  • 23. Experiments • Results: Association finding BSC: Basic solution (not pruning) PRN: Optimized solution (distance-based pruning) PRN-1: Optimized solution (distance-based pruning except for the last level of search)
  • 24. Experiments • Results: Frequent association pattern mining • LinkedMDB • <10,000 associations: <21ms • 13,531 associations: 68ms • DBpedia • <10,000 associations: <65ms • 1,198,968 associations: 2909ms
  • 25. Takeaway messages • Subgraph finding and mining are faster than what we expected. • Consider distance oracle and canonical code in your own research.
  • 26. Takeaway messages • Subgraph finding and mining are faster than what we expected. • Consider distance oracle and canonical code in your own research.