SlideShare a Scribd company logo
The 5th International Joint
conference on Rules and Reasoning
Sept. 8–15, 2021
(virtually in) Leuven, Belgium
An Answer Set Programming based framework for High-Utility
Pattern Mining extended with Facets and Advanced Utility
Functions
Francesco Cauteruccio and Giorgio Terracina
DEMACS, University of Calabria, Italy
{cauteruccio, terracina}@mat.unical.it
Outline
• Context and Motivation
• Proposed Framework
• ASP Approach
• Experimental Evaluation
• Conclusion
1/17
Context and Motivation
• Pattern Mining is one of the most studied data mining branches
• Find interesting patterns (set of items) in a database of transactions
• Frequent pattern mining, sequential pattern mining, etc…
• High-Utility Pattern Mining (HUPM)
• Find patterns having a high-utility (w.r.t. some utility measure)
• Example: in a sales database, the utility of a pattern may be represented by the profit of some items bought
together.
• Basic assumption: each item is associated with one, static utility .
• However…
• The utility of an item can be defined from very different point of views,
• Transactions are not only flat lists of items but they can provide different level of abstractions.
Pattern Mining and High-Utility Pattern Mining
2/17
Context and Motivation
• We present a framework for HUPM extending basic notions and introducing:
• A higher level of abstraction with transaction set representation
• For each transaction, an Object, a Container and a Database level of aggregation can be defined.
• The notion of facet
• A facet is an attribute which can be associated with an item, a transaction, an object or a container;
• Each element may be characterized by more than one facet.
• A taxonomy of extended utility functions
• Based on the database structure and facets,
• They can be combined in several ways to fit different notions of utility.
An extended framework for HUPM
3/17
Proposed Framework
• The database representation is multi-layer
• 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛
• Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"}
• 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"}
• 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$}
• 𝑂% contains a set of transactions {𝑇!, … , 𝑇&}
Extending the database structure
Running example depicting a sales database
5/17
Proposed Framework
• The utility of 𝑖 may be defined from
different perspectives: facets.
• Each item 𝑖 can be associated with
one or more facets.
• Facets can also be defined for
transactions, objects and containers.
• To describe facets, we use the notion
of utility vector.
Facets
6/17
Proposed Framework
• Item utility vector for an item 𝑖
• 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖
• Transaction utility vector for a transaction 𝑇#
• 𝑇𝑈&!
= [𝑡𝑢", 𝑡𝑢#, … , 𝑡𝑢'], and 𝑡𝑢% describes a certain facet of 𝑇(
• Internal utility of 𝑖 in 𝑇( is available and denoted as 𝑞(𝑖, 𝑇()
• Object utility vector
• 𝑂𝑈) = 𝑜𝑢", 𝑜𝑢#, … , 𝑜𝑢* , and 𝑜𝑢% describes a certain facet of 𝑂
• Container utility vector
• 𝐶𝑈+ = [𝑐𝑢", 𝑐𝑢#, … , 𝑐𝑢,], and 𝑐𝑢% describes a certain facet of 𝐶
Facets and utility vectors
facets: price, weight
7/17
Proposed Framework
• Intra-pattern utility function
• Let 𝑃 be a pattern with 𝑟 items, and 𝑇( be a transaction containing it,
• Let 𝐼𝑈𝑆 = {𝐼𝑈", … , 𝐼𝑈-} be the set of item utility vectors of 𝑃
• The intra-pattern utility function 𝐼𝑈&!
= 𝑖𝑝(𝑃, 𝑇(, 𝐼𝑈𝑆) generates an
unique item utility vector for the pattern occurrence.
• 𝑖𝑝 ⋅ can be any function combining the utilities across the facets,
such as SUM, MAX, MIN, AVG, etc…
• Pattern utility function
• The occurrence utility vector for a pattern 𝑃 is 𝑂𝑐𝑐𝑈&!
=
[𝐼𝑈&!
, 𝑇𝑈&!
, 𝑂𝑈&!
, 𝐶𝑈&!
]
• The pattern utility vector 𝑈. is the collection of all the occurrence
utility vectors of 𝑃: 𝑈. = ⋃&!∈&"
𝑂𝑐𝑐𝑈&!
• 𝑈. is a matrix (occurrence utility vectors × facets).
Advanced utility functions
In this example, 𝑖𝑝 = ∑ 𝑓𝑎𝑐𝑒𝑡 × 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑈𝑡𝑖𝑙𝑖𝑡𝑦
8/17
Proposed Framework
• The utility 𝑢 of a pattern 𝑃 can be obtained as an arbitrary
combination of the values of 𝑈! using a function 𝑢(𝑃)
• 𝑢(𝑃) can be classified as
• Horizontal first 𝑢 𝑃 = 𝑓?(𝑓@(𝑢.)) combines by row (facets), then by
column (occurrences)
• Vertical first 𝑢 𝑃 = 𝑓@(𝑓?(𝑢.)) combines by column (occurrences),
then by row (facets)
• Mixed 𝑢 𝑃 = 𝑓 𝑢. combines the values at once
• All of these classifications can be further classified in
• Inter-transaction utility
• Pattern-vs-object utility
• Pattern-vs-container utility
• These can be exploited to define utility measures relating
item/transaction facets and one of the object/container facets,
such as Pearson and Multiple correlation.
Advanced utility functions
As an example, suppose we select 𝑓# =
𝑓𝑖𝑙𝑡𝑒𝑟 ⋅ and 𝑓$ = max ⋅ , that is we first
filter for a single facet, and then we take
the maximum across all the occurrences.
9/17
Proposed Framework
• Given a pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than
a minimum threshold 𝑡ℎ$ and it occurs in at least 𝑡ℎ% transactions.
• The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended
high-utility patterns in a given database 𝐷.
The e–HUPM problem
10/17
ASP Approach
• We want to provide as much flexibility as possible in the definition of what is a useful pattern.
• Encoding the problem in Answer Set Programming (ASP) helps achieving the desidered flexibility
and modularity.
• Classic guess-and-check scenario
• We generate one answer set for each valid pattern,
• Complex utility functions are executed by means of external functions (e.g., in DLVHEX, WASP, clingo)
• Pattern validity criteria and filter can be easily applied by encoding them as rules.
The why and the how
11/17
ASP Approach
The encoding
%%% Input schema:
%container(ContainerId)
%object(ObjectId,ContainerId)
%transaction(Tid, ObjectId)
%item(Item, Tid, Position, Q)
%itemUtilityVector(Item, I1, ..., Il)
%transactionUtilityVector(Tid, T1, ..., Tm)
%objectUtilityVector(ObjectId, O1, ..., On)
%containerUtilityVector(ContainerId, C1, ..., Co)
%%% Parameters
occurrencesThreshold(...). utilityTreshold(...).
%%% Item pre-filtering
usefulItem(I):- item(I,_,_,_),....any condition on the items.
%%% Candidate pattern generation
{inCandidatePattern(I)}:- usefulItem(I).
%%% Occurrences computation and check
inTransaction(Tid):- transaction(Tid,_), not incomplete(Tid).
incomplete(TiD):- transaction(Tid,_), inCandidatePattern(I), not contains(I,Tid).
contains(I,Tid):- item(I,Tid,_,_).
:- #count{ Tid : inTransaction(Tid)}=N, N < Tho, occurrencesThreshold(Tho).
%%% Utility computation
patternItemUtilityVectors(Tid,Item,I1,...,Il,Q):- inCandidatePattern(Item),
itemUtilityVector(Item, I1, ..., Il), inTransaction(Tid), item(Item, Tid, Position, Q).
intraPatternUtilityVector(Tid,I1,...,Il):-
&computeIntraPatternUtility[patternItemUtilityVectors](Tid,I1,...,Il).
occurrenceUtilityVector(Tid,I1,...,Il,T1,...Tm,O1,...On,C1,...,Co):-
inTransaction(Tid), intraPatternUtilityVector(Tid,I1,...,Il)
transactionUtilityVector(Tid, T1, ..., Tm), transaction(Tid, ObjectId),
objectUtilityVector(ObjectId, O1, ..., On), object(ObjectId , ContainerId)
containerUtilityVector(ContainerId, C1, ..., Co).
:- &computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 12/17
Experimental Evaluation
• Aspect-based sentiment analysis of scientific reviews [1]
• Each review is annotated with 8 different aspects: appropriataness, clarity, originality,
empirical/theoretical soudness, meaningful comparison, substance, impact and recommendation
• Each aspect has assigned one sentiment value from {𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑛𝑒𝑢𝑡𝑟𝑎𝑙, 𝑎𝑏𝑠𝑒𝑛𝑡}.
• 814 papers and a total of 1148 reviews.
• 2230 annotated lines with a total of 15124 distinct words.
Dataset
[1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5,
2020. pp. 207–216 (2020), ACM
13/17
Experimental Evaluation
• Average running time
• 𝑢(𝑃): disagreement between originality and decision (% of
pattern occurrences showing a positive originality and a
reject decision, thresholds: [15, 50, 75, 100])
Quantitative analysis
• Comparison with HUPM systems
• No analysis involving more than one
elements can be devised with such
systems
• Here, the utility of an item is the absolute
value of the appropriateness aspect in
the sentence it first appears in.
14/17
Experimental Evaluation
• 𝑢 𝑃 : Pearson correlation between an aspect 𝑋
and the final decision on the corresponding
paper.
• «The set of review sentences containing 𝑃 are
characterized by a direct correlation between the
sentiment on 𝑋 and the final decision on the
corresponding paper»
• 𝑢(𝑃): Multiple correlation between two aspects
𝑋 and 𝑌 and the final decision on the
corresponding paper.
Qualitative analysis
15/17
Conclusion
• We introduced a general framework for HUPM with several extensions.
• The framework allows to work with multi-dimensional data and with different utility measures.
• A versatile and modular ASP encoding has been developed.
• We employed a real use case on paper review to carry out both quantitative and qualitative
analyses.
• Facets and advanced utility functions help reducing the amount of relevant patterns
• Useful in providing deep insights on the data.
• Not an ending point!
• Apply the framework to new contexts,
• Derive ad-hoc algorithms for the extended utility functions.
16/17
Thanks for your attention!
These slides available at https://bit.ly/ehupm-ruleml21
Francesco Cauteruccio
Research Fellow @ DEMACS, University of Calabria
cauteruccio@mat.unical.it
francescocauteruccio.info
@finalfire

More Related Content

What's hot

The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
ankurpandeyinfo
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
KONGU ENGINEERING COLLEGE
 
4 module 3 --
4 module 3 --4 module 3 --
4 module 3 --
tafosepsdfasg
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
rajshreemuthiah
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
IOSR Journals
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
DataminingTools Inc
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
ranjit banshpal
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
guest0edcaf
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
tafosepsdfasg
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
tafosepsdfasg
 
3 module 2
3 module 23 module 2
3 module 2
tafosepsdfasg
 
Matlab Data And Statistics
Matlab Data And StatisticsMatlab Data And Statistics
Matlab Data And Statistics
DataminingTools Inc
 

What's hot (12)

The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
4 module 3 --
4 module 3 --4 module 3 --
4 module 3 --
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
 
3 module 2
3 module 23 module 2
3 module 2
 
Matlab Data And Statistics
Matlab Data And StatisticsMatlab Data And Statistics
Matlab Data And Statistics
 

Similar to An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions

Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structures
sonykhan3
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
Alexander Decker
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Alexander Decker
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
Chittagong Independent University
 
Programming in python
Programming in pythonProgramming in python
Programming in python
Ivan Rojas
 
Introduction to Design Patterns in Javascript
Introduction to Design Patterns in JavascriptIntroduction to Design Patterns in Javascript
Introduction to Design Patterns in Javascript
Santhosh Kumar Srinivasan
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
example43
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
iqbalphy1
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
GopalPatidar13
 
Data Analysis – Technical learnings
Data Analysis – Technical learningsData Analysis – Technical learnings
Data Analysis – Technical learnings
InvenkLearn
 
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas RauberPreservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
JISC KeepIt project
 
Query processing System
Query processing SystemQuery processing System
2021 icse reducedsylabiix-computer applications
2021 icse reducedsylabiix-computer applications2021 icse reducedsylabiix-computer applications
2021 icse reducedsylabiix-computer applications
Vahabshaik Shai
 
Data Structures - Lecture 1 - Unit 1.pptx
Data Structures  - Lecture 1 - Unit 1.pptxData Structures  - Lecture 1 - Unit 1.pptx
Data Structures - Lecture 1 - Unit 1.pptx
DanielNesaKumarC
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduce
sscdotopen
 
Algorithms Analysis.pdf
Algorithms Analysis.pdfAlgorithms Analysis.pdf
Algorithms Analysis.pdf
ShaistaRiaz4
 

Similar to An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions (20)

Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structures
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
 
Programming in python
Programming in pythonProgramming in python
Programming in python
 
Introduction to Design Patterns in Javascript
Introduction to Design Patterns in JavascriptIntroduction to Design Patterns in Javascript
Introduction to Design Patterns in Javascript
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
 
Data Analysis – Technical learnings
Data Analysis – Technical learningsData Analysis – Technical learnings
Data Analysis – Technical learnings
 
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas RauberPreservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
 
Query processing System
Query processing SystemQuery processing System
Query processing System
 
2021 icse reducedsylabiix-computer applications
2021 icse reducedsylabiix-computer applications2021 icse reducedsylabiix-computer applications
2021 icse reducedsylabiix-computer applications
 
Data Structures - Lecture 1 - Unit 1.pptx
Data Structures  - Lecture 1 - Unit 1.pptxData Structures  - Lecture 1 - Unit 1.pptx
Data Structures - Lecture 1 - Unit 1.pptx
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduce
 
Algorithms Analysis.pdf
Algorithms Analysis.pdfAlgorithms Analysis.pdf
Algorithms Analysis.pdf
 

Recently uploaded

Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 

Recently uploaded (20)

Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 

An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions

  • 1. The 5th International Joint conference on Rules and Reasoning Sept. 8–15, 2021 (virtually in) Leuven, Belgium An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions Francesco Cauteruccio and Giorgio Terracina DEMACS, University of Calabria, Italy {cauteruccio, terracina}@mat.unical.it
  • 2. Outline • Context and Motivation • Proposed Framework • ASP Approach • Experimental Evaluation • Conclusion 1/17
  • 3. Context and Motivation • Pattern Mining is one of the most studied data mining branches • Find interesting patterns (set of items) in a database of transactions • Frequent pattern mining, sequential pattern mining, etc… • High-Utility Pattern Mining (HUPM) • Find patterns having a high-utility (w.r.t. some utility measure) • Example: in a sales database, the utility of a pattern may be represented by the profit of some items bought together. • Basic assumption: each item is associated with one, static utility . • However… • The utility of an item can be defined from very different point of views, • Transactions are not only flat lists of items but they can provide different level of abstractions. Pattern Mining and High-Utility Pattern Mining 2/17
  • 4. Context and Motivation • We present a framework for HUPM extending basic notions and introducing: • A higher level of abstraction with transaction set representation • For each transaction, an Object, a Container and a Database level of aggregation can be defined. • The notion of facet • A facet is an attribute which can be associated with an item, a transaction, an object or a container; • Each element may be characterized by more than one facet. • A taxonomy of extended utility functions • Based on the database structure and facets, • They can be combined in several ways to fit different notions of utility. An extended framework for HUPM 3/17
  • 5. Proposed Framework • The database representation is multi-layer • 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 • Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"} • 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"} • 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$} • 𝑂% contains a set of transactions {𝑇!, … , 𝑇&} Extending the database structure Running example depicting a sales database 5/17
  • 6. Proposed Framework • The utility of 𝑖 may be defined from different perspectives: facets. • Each item 𝑖 can be associated with one or more facets. • Facets can also be defined for transactions, objects and containers. • To describe facets, we use the notion of utility vector. Facets 6/17
  • 7. Proposed Framework • Item utility vector for an item 𝑖 • 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖 • Transaction utility vector for a transaction 𝑇# • 𝑇𝑈&! = [𝑡𝑢", 𝑡𝑢#, … , 𝑡𝑢'], and 𝑡𝑢% describes a certain facet of 𝑇( • Internal utility of 𝑖 in 𝑇( is available and denoted as 𝑞(𝑖, 𝑇() • Object utility vector • 𝑂𝑈) = 𝑜𝑢", 𝑜𝑢#, … , 𝑜𝑢* , and 𝑜𝑢% describes a certain facet of 𝑂 • Container utility vector • 𝐶𝑈+ = [𝑐𝑢", 𝑐𝑢#, … , 𝑐𝑢,], and 𝑐𝑢% describes a certain facet of 𝐶 Facets and utility vectors facets: price, weight 7/17
  • 8. Proposed Framework • Intra-pattern utility function • Let 𝑃 be a pattern with 𝑟 items, and 𝑇( be a transaction containing it, • Let 𝐼𝑈𝑆 = {𝐼𝑈", … , 𝐼𝑈-} be the set of item utility vectors of 𝑃 • The intra-pattern utility function 𝐼𝑈&! = 𝑖𝑝(𝑃, 𝑇(, 𝐼𝑈𝑆) generates an unique item utility vector for the pattern occurrence. • 𝑖𝑝 ⋅ can be any function combining the utilities across the facets, such as SUM, MAX, MIN, AVG, etc… • Pattern utility function • The occurrence utility vector for a pattern 𝑃 is 𝑂𝑐𝑐𝑈&! = [𝐼𝑈&! , 𝑇𝑈&! , 𝑂𝑈&! , 𝐶𝑈&! ] • The pattern utility vector 𝑈. is the collection of all the occurrence utility vectors of 𝑃: 𝑈. = ⋃&!∈&" 𝑂𝑐𝑐𝑈&! • 𝑈. is a matrix (occurrence utility vectors × facets). Advanced utility functions In this example, 𝑖𝑝 = ∑ 𝑓𝑎𝑐𝑒𝑡 × 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑈𝑡𝑖𝑙𝑖𝑡𝑦 8/17
  • 9. Proposed Framework • The utility 𝑢 of a pattern 𝑃 can be obtained as an arbitrary combination of the values of 𝑈! using a function 𝑢(𝑃) • 𝑢(𝑃) can be classified as • Horizontal first 𝑢 𝑃 = 𝑓?(𝑓@(𝑢.)) combines by row (facets), then by column (occurrences) • Vertical first 𝑢 𝑃 = 𝑓@(𝑓?(𝑢.)) combines by column (occurrences), then by row (facets) • Mixed 𝑢 𝑃 = 𝑓 𝑢. combines the values at once • All of these classifications can be further classified in • Inter-transaction utility • Pattern-vs-object utility • Pattern-vs-container utility • These can be exploited to define utility measures relating item/transaction facets and one of the object/container facets, such as Pearson and Multiple correlation. Advanced utility functions As an example, suppose we select 𝑓# = 𝑓𝑖𝑙𝑡𝑒𝑟 ⋅ and 𝑓$ = max ⋅ , that is we first filter for a single facet, and then we take the maximum across all the occurrences. 9/17
  • 10. Proposed Framework • Given a pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than a minimum threshold 𝑡ℎ$ and it occurs in at least 𝑡ℎ% transactions. • The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended high-utility patterns in a given database 𝐷. The e–HUPM problem 10/17
  • 11. ASP Approach • We want to provide as much flexibility as possible in the definition of what is a useful pattern. • Encoding the problem in Answer Set Programming (ASP) helps achieving the desidered flexibility and modularity. • Classic guess-and-check scenario • We generate one answer set for each valid pattern, • Complex utility functions are executed by means of external functions (e.g., in DLVHEX, WASP, clingo) • Pattern validity criteria and filter can be easily applied by encoding them as rules. The why and the how 11/17
  • 12. ASP Approach The encoding %%% Input schema: %container(ContainerId) %object(ObjectId,ContainerId) %transaction(Tid, ObjectId) %item(Item, Tid, Position, Q) %itemUtilityVector(Item, I1, ..., Il) %transactionUtilityVector(Tid, T1, ..., Tm) %objectUtilityVector(ObjectId, O1, ..., On) %containerUtilityVector(ContainerId, C1, ..., Co) %%% Parameters occurrencesThreshold(...). utilityTreshold(...). %%% Item pre-filtering usefulItem(I):- item(I,_,_,_),....any condition on the items. %%% Candidate pattern generation {inCandidatePattern(I)}:- usefulItem(I). %%% Occurrences computation and check inTransaction(Tid):- transaction(Tid,_), not incomplete(Tid). incomplete(TiD):- transaction(Tid,_), inCandidatePattern(I), not contains(I,Tid). contains(I,Tid):- item(I,Tid,_,_). :- #count{ Tid : inTransaction(Tid)}=N, N < Tho, occurrencesThreshold(Tho). %%% Utility computation patternItemUtilityVectors(Tid,Item,I1,...,Il,Q):- inCandidatePattern(Item), itemUtilityVector(Item, I1, ..., Il), inTransaction(Tid), item(Item, Tid, Position, Q). intraPatternUtilityVector(Tid,I1,...,Il):- &computeIntraPatternUtility[patternItemUtilityVectors](Tid,I1,...,Il). occurrenceUtilityVector(Tid,I1,...,Il,T1,...Tm,O1,...On,C1,...,Co):- inTransaction(Tid), intraPatternUtilityVector(Tid,I1,...,Il) transactionUtilityVector(Tid, T1, ..., Tm), transaction(Tid, ObjectId), objectUtilityVector(ObjectId, O1, ..., On), object(ObjectId , ContainerId) containerUtilityVector(ContainerId, C1, ..., Co). :- &computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 12/17
  • 13. Experimental Evaluation • Aspect-based sentiment analysis of scientific reviews [1] • Each review is annotated with 8 different aspects: appropriataness, clarity, originality, empirical/theoretical soudness, meaningful comparison, substance, impact and recommendation • Each aspect has assigned one sentiment value from {𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑛𝑒𝑢𝑡𝑟𝑎𝑙, 𝑎𝑏𝑠𝑒𝑛𝑡}. • 814 papers and a total of 1148 reviews. • 2230 annotated lines with a total of 15124 distinct words. Dataset [1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. pp. 207–216 (2020), ACM 13/17
  • 14. Experimental Evaluation • Average running time • 𝑢(𝑃): disagreement between originality and decision (% of pattern occurrences showing a positive originality and a reject decision, thresholds: [15, 50, 75, 100]) Quantitative analysis • Comparison with HUPM systems • No analysis involving more than one elements can be devised with such systems • Here, the utility of an item is the absolute value of the appropriateness aspect in the sentence it first appears in. 14/17
  • 15. Experimental Evaluation • 𝑢 𝑃 : Pearson correlation between an aspect 𝑋 and the final decision on the corresponding paper. • «The set of review sentences containing 𝑃 are characterized by a direct correlation between the sentiment on 𝑋 and the final decision on the corresponding paper» • 𝑢(𝑃): Multiple correlation between two aspects 𝑋 and 𝑌 and the final decision on the corresponding paper. Qualitative analysis 15/17
  • 16. Conclusion • We introduced a general framework for HUPM with several extensions. • The framework allows to work with multi-dimensional data and with different utility measures. • A versatile and modular ASP encoding has been developed. • We employed a real use case on paper review to carry out both quantitative and qualitative analyses. • Facets and advanced utility functions help reducing the amount of relevant patterns • Useful in providing deep insights on the data. • Not an ending point! • Apply the framework to new contexts, • Derive ad-hoc algorithms for the extended utility functions. 16/17
  • 17. Thanks for your attention! These slides available at https://bit.ly/ehupm-ruleml21 Francesco Cauteruccio Research Fellow @ DEMACS, University of Calabria cauteruccio@mat.unical.it francescocauteruccio.info @finalfire