The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. We describe COOL, a framework devised for COnversational OLap applications. COOL interprets and translates a natural language dialog into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query and continues with the application of OLAP operators. The interpretation relies on a formal grammar and on a repository storing metadata and values from a multidimensional cube. In case of ambiguous text description, COOL can obtain the correct query either through automatic inference or user interactions to disambiguate the text.
1. SEBD 2021
Conversational OLAP
(discussion paper)
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli
University of Bologna, Italy
29th Italian Symposium on Advanced Database Systems (SEBD 2021)
2. SEBD 2021
Motivation
Goal: query multidimensional cubes through natural language
Natural language enables analytics in hand-free scenarios [1]
- Augmented reality or with smart assistants
OLAP is based on standard operators [2]
- No help in query construction and natural language disambiguation
We introduce COOL (COnversational OLap) [3]
Matteo Francia – University of Bologna 2
Introduction
[1] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+: A framework for Augmented Business Intelligence. Information Systems. (2020)
[2] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's: An intentional analytics model to reinvent OLAP. Information Systems. (2019)
[3] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)
3. SEBD 2021
COOL: architecture
Matteo Francia – University of Bologna 3
COOL:
overview
Automatic
KB feeding
Manual KB
enrichment KB
DW
Metadata
& values
Synonyms
Offline
Online
Synonyms
Ontology
4. SEBD 2021
COOL: architecture
Matteo Francia – University of Bologna 4
COOL:
overview
Speech-
to-Text
OLAP
operator
Full query
Disambiguation
& Enhancement
Execution &
Visualization
Automatic
KB feeding
Manual KB
enrichment
Raw
text
Annotated
parse forest
Parse
tree
Metadata
& values
Synonyms
Log
Interpretation
Offline
Online
Synonyms
Ontology
SQL
generation
SQL
Sales by
Customer and
Month
Parse tree
Statistics
KB
DW
5. SEBD 2021
Robustness: given a text T, we allow several mappings
- E.g., by matching each n-gram to a set of similar entities from the KB
T = «return the average sales for the product NY in each region»
M1 = avg, UnitSales, where, Product, =, New York, group by, Region
M2 = avg, UnitSales, where, Product, =, New York, group by, Regin
⟨GPSJ⟩ ::= ⟨MC⟩⟨GC⟩⟨SC⟩
⟨MC⟩ ::= (⟨Agg⟩⟨Mea⟩ | ⟨Cnt⟩⟨Fct⟩)+
⟨GC⟩ ::= “𝑔𝑟𝑜𝑢𝑝 𝑏𝑦” ⟨Attr⟩+
⟨SC⟩ ::= “𝑤ℎ𝑒𝑟𝑒” ⟨SCA⟩
⟨SCA⟩ ::= ⟨SCN⟩ “𝑎𝑛𝑑” ⟨SCA⟩ | ⟨SCN⟩
⟨SCN⟩ ::= “𝑛𝑜𝑡” ⟨SSC⟩ | ⟨SSC⟩
⟨SSC⟩ ::= ⟨Attr⟩⟨Cop⟩⟨Val⟩ | ⟨Attr⟩⟨Val⟩ | ⟨Val⟩
⟨Cop⟩ ::= “=” | “<>” | “>” | “<” | “≥” | “≤”
⟨Agg⟩ ::= “𝑠𝑢𝑚” | “𝑎𝑣𝑔” | “𝑚𝑖𝑛” | “𝑚𝑎𝑥”
⟨Cnt⟩ ::= “𝑐𝑜𝑢𝑛𝑡” | “𝑐𝑜𝑢𝑛𝑡 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡”
⟨Fct⟩ ::= Domain-specific facts
⟨Mea⟩ ::= Domain-specific measures
⟨Attr⟩ ::= Domain-specific attributes
⟨Val⟩ ::= Domain-specific values
COOL: interpretation
Matteo Francia – University of Bologna 5
COOL:
interpretation
M1 = avg, UnitSales, where, Product, =, New York, group by, Region
Mea
Agg “where”
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
Attr “group by”Attr
GC
T = «return the average sales for the product NY in each region»
6. SEBD 2021
COOL: ambiguities
Not all syntactically-correct clauses
are "valid"
- E.g., New York is not a Product
- Annotate it
- Ask a question for each ambiguity
Matteo Francia – University of Bologna 6
M1 = avg, UnitSales, where, Product, =, New York, group by, Region
Mea
Agg “where”
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
Attr “group by”Attr
GC
AVM
COOL:
interpretation
T = «return the average sales for the product NY in each region»
New York is a not a
product, could you pick
a product among ...?
7. SEBD 2021
COOL: ambiguities
Some clauses could be excluded
from the GPSJ query
- Annotate them for (possible) addiction
Matteo Francia – University of Bologna 7
COOL:
interpretation
M2 = avg, UnitSales, where, Product, =, New York, group by, Regin
Mea
Agg “where” “group by”
MC SC
GPSJ
Val
SCA
SCN
SSC
SCA
SCN
SSC
Val
SC
Attr Cop
AVM
unparsed
T = «return the average sales for the product NY in each region»
Do you want to add
the selection predicate
"Regin"?
8. SEBD 2021
COOL: scoring function
Return the forest with the highest score
Matteo Francia – University of Bologna 8
COOL:
interpretation
M1 = avg, UnitSales, where, Product, =, New York, group by, Region
Mea
Agg “where”
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
Attr “group by”Attr
GC
AVM
M2 = avg, UnitSales, where, Product, =, New York, group by, Regin
Mea
Agg “where” “group by”
MC SC
GPSJ
Val
SCA
SCN
SSC
SCA
SCN
SSC
Val
SC
Attr Cop
AVM
unparsed
Score(PFM1) Score(PFM2)
Score(M1) Score(M2)
Score(PFM1) > Score(PFM2)
9. SEBD 2021
COOL: scoring function
Also, use the score for pruning purpose
- Sort all the mappings by descending score Score(M)
- First, parse the mapping with the highest Score(M)
- Then, parse only the mappings s.t. Score(M) > Score(PFM)
Matteo Francia – University of Bologna 9
COOL:
interpretation
M1 = avg, UnitSales, where, Product, =, New York, group by, Region
Mea
Agg “where”
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
Attr “group by”Attr
GC
AVM
M3 = avg, UnitSales, where, Product, =, New York
Score(PFM1)
Score(M3)
Score(M1)
10. SEBD 2021
Experimental Evaluation
Top-𝑘 accuracy by varying the
similarity 𝛼 to build the mappings
- Real-world dataset from [1]
- Accuracy is stable wrt to 𝑘 (up to 94%)
- 𝛼 depends on the inaccuracies in the text
Matteo Francia – University of Bologna 10
Results
[1] K. Drushku, J. Aligon, N. Labroche, P. Marcel, V. Peralta, Interest-based recommendations for business intelligence users, Inf. Syst. 86 (2019)
11. SEBD 2021
User Evaluation
40 users with heterogeneous OLAP skills
- Asked to translate (Italian) analytic goals into English
- Users provided good feedback on the interface...
- ... as well as on the interpretation accuracy
Matteo Francia – University of Bologna 11
Results
Full Query OLAP operator
OLAP Familiarity Accuracy Time (s) Accuracy Time (s)
Low 0.91 141 0.86 102
High 0.91 97 0.92 71
17. SEBD 2021
Questions?
Matteo Francia – University of Bologna 17
Thank you.
Full paper:
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli:
COOL: A framework for conversational OLAP.
Information Systems. (2021)
Best demo award:
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli:
Conversational OLAP in Action!
EDBT. (2021)
Editor's Notes
DIFF: [17] returns tuples that maximize difference between cells of a cube given as input
Profile user exploration to recommend which unvisited parts of the cube
RELAXoperator allows toverify whether a pattern observed at a certain level of detail ispresent at a coarser level of detail too [19]
Alternative operators have also been proposed in theCinecubes method [7,8]. The goal of this effort is to facilitateautomated reporting, given an original OLAP query as input.To achieve this purpose two operators (expressed asacts) areproposed, namely, (a)put-in-context, i.e., compare the result ofthe original query to query results over similar, sibling values;and (b)give-details, where drill-downs of the original query’sgroupers are performed.
DIFF: [17] returns tuples that maximize difference between cells of a cube given as input
Profile user exploration to recommend which unvisited parts of the cube
RELAXoperator allows toverify whether a pattern observed at a certain level of detail ispresent at a coarser level of detail too [19]
Alternative operators have also been proposed in theCinecubes method [7,8]. The goal of this effort is to facilitateautomated reporting, given an original OLAP query as input.To achieve this purpose two operators (expressed asacts) areproposed, namely, (a)put-in-context, i.e., compare the result ofthe original query to query results over similar, sibling values;and (b)give-details, where drill-downs of the original query’sgroupers are performed.