[SEBD2021] Conversational OLAP

SEBD 2021
Conversational OLAP
(discussion paper)
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli
University of Bologna, Italy
29th Italian Symposium on Advanced Database Systems (SEBD 2021)

SEBD 2021
Motivation
Goal: query multidimensional cubes through natural language
Natural language enables analytics in hand-free scenarios [1]
- Augmented reality or with smart assistants
OLAP is based on standard operators [2]
- No help in query construction and natural language disambiguation
We introduce COOL (COnversational OLap) [3]
Matteo Francia – University of Bologna 2
Introduction
[1] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+: A framework for Augmented Business Intelligence. Information Systems. (2020)
[2] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's: An intentional analytics model to reinvent OLAP. Information Systems. (2019)
[3] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)

SEBD 2021
COOL: architecture
COOL:
overview
Automatic
KB feeding
Manual KB
enrichment KB
DW
Metadata
& values
Synonyms
Offline
Online
Synonyms
Ontology

SEBD 2021
COOL: architecture
COOL:
overview
Speech-
to-Text
OLAP
operator
Full query
Disambiguation
& Enhancement
Execution &
Visualization
Automatic
KB feeding
Manual KB
enrichment
Raw
text
Annotated
parse forest
Parse
tree
Metadata
& values
Synonyms
Log
Interpretation
Offline
Online
Synonyms
Ontology
SQL
generation
SQL
Sales by
Customer and
Month
Parse tree
Statistics
KB
DW

SEBD 2021
Robustness: given a text T, we allow several mappings
- E.g., by matching each n-gram to a set of similar entities from the KB
T = «return the average sales for the product NY in each region»
M1 = avg, UnitSales, where, Product, =, New York, group by, Region
M2 = avg, UnitSales, where, Product, =, New York, group by, Regin
⟨GPSJ⟩ ::= ⟨MC⟩⟨GC⟩⟨SC⟩
⟨MC⟩ ::= (⟨Agg⟩⟨Mea⟩ | ⟨Cnt⟩⟨Fct⟩)+
⟨GC⟩ ::= “𝑔𝑟𝑜𝑢𝑝 𝑏𝑦” ⟨Attr⟩+
⟨SC⟩ ::= “𝑤ℎ𝑒𝑟𝑒” ⟨SCA⟩
⟨SCA⟩ ::= ⟨SCN⟩ “𝑎𝑛𝑑” ⟨SCA⟩ | ⟨SCN⟩
⟨SCN⟩ ::= “𝑛𝑜𝑡” ⟨SSC⟩ | ⟨SSC⟩
⟨SSC⟩ ::= ⟨Attr⟩⟨Cop⟩⟨Val⟩ | ⟨Attr⟩⟨Val⟩ | ⟨Val⟩
⟨Cop⟩ ::= “=” | “<>” | “>” | “<” | “≥” | “≤”
⟨Agg⟩ ::= “𝑠𝑢𝑚” | “𝑎𝑣𝑔” | “𝑚𝑖𝑛” | “𝑚𝑎𝑥”
⟨Cnt⟩ ::= “𝑐𝑜𝑢𝑛𝑡” | “𝑐𝑜𝑢𝑛𝑡 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡”
⟨Fct⟩ ::= Domain-specific facts
⟨Mea⟩ ::= Domain-specific measures
⟨Attr⟩ ::= Domain-specific attributes
⟨Val⟩ ::= Domain-specific values
COOL: interpretation
COOL:
interpretation
Mea
Agg “where”
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
Attr “group by”Attr
GC

SEBD 2021
COOL: ambiguities
Not all syntactically-correct clauses
are "valid"
- E.g., New York is not a Product
- Annotate it
- Ask a question for each ambiguity
Mea
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
GC
AVM
COOL:
interpretation
New York is a not a
product, could you pick
a product among ...?

SEBD 2021
COOL: ambiguities
Some clauses could be excluded
from the GPSJ query
- Annotate them for (possible) addiction
COOL:
interpretation
Mea
Agg “where” “group by”
MC SC
GPSJ
Val
SCA
SCN
SSC
SCA
SCN
SSC
Val
SC
Attr Cop
AVM
unparsed
Do you want to add
the selection predicate
"Regin"?

SEBD 2021
COOL: scoring function
Return the forest with the highest score
COOL:
interpretation
Mea
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
GC
AVM
Mea
Agg “where” “group by”
MC SC
GPSJ
Val
SCA
SCN
SSC
SCA
SCN
SSC
Val
SC
Attr Cop
AVM
unparsed
Score(PFM1) Score(PFM2)
Score(M1) Score(M2)
Score(PFM1) > Score(PFM2)

SEBD 2021
COOL: scoring function
Also, use the score for pruning purpose
- Sort all the mappings by descending score Score(M)
- First, parse the mapping with the highest Score(M)
- Then, parse only the mappings s.t. Score(M) > Score(PFM)
COOL:
interpretation
Mea
MC SC
GPSJ
SCA
SCN
SSC
Val
Cop
GC
AVM
M3 = avg, UnitSales, where, Product, =, New York
Score(PFM1)
Score(M3)
Score(M1)

SEBD 2021
Experimental Evaluation
Top-𝑘 accuracy by varying the
similarity 𝛼 to build the mappings
- Real-world dataset from [1]
- Accuracy is stable wrt to 𝑘 (up to 94%)
- 𝛼 depends on the inaccuracies in the text
Results
[1] K. Drushku, J. Aligon, N. Labroche, P. Marcel, V. Peralta, Interest-based recommendations for business intelligence users, Inf. Syst. 86 (2019)

SEBD 2021
User Evaluation
40 users with heterogeneous OLAP skills
- Asked to translate (Italian) analytic goals into English
- Users provided good feedback on the interface...
- ... as well as on the interpretation accuracy
Results
Full Query OLAP operator
OLAP Familiarity Accuracy Time (s) Accuracy Time (s)
Low 0.91 141 0.86 102
High 0.91 97 0.92 71

SEBD 2021
In
Action!
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Conversational OLAP in Action!
EDBT (Best demo award). (2021)

SEBD 2021
Questions?
Thank you.
Full paper:
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli:
COOL: A framework for conversational OLAP.
Information Systems. (2021)
Best demo award:
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli:
Conversational OLAP in Action!
EDBT. (2021)

[SEBD2021] Conversational OLAP

Recommended

Recommended

More Related Content

Similar to [SEBD2021] Conversational OLAP

Similar to [SEBD2021] Conversational OLAP (20)

Recently uploaded

Recently uploaded (20)

[SEBD2021] Conversational OLAP

Editor's Notes