Paper presented at DOLAP 2020: Towards Conversational OLAP
Link to the presentation: https://youtu.be/IfBc1H46s8Y
Abstract: The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this paper, we envisage a conversational framework specifically devised for OLAP applications. The system converts natural language text in GPSJ (Generalized Projection, Selection and Join) queries. The approach relies on an ad-hoc grammar and a knowledge base storing multidimensional metadata and cubes values. In case of ambiguous or incomplete query description, the system is able to obtain the correct query either through automatic inference or through interactions with the user to disambiguate the text. Our tests show very promising results both in terms of effectiveness and efficiency.
Authors: Matteo Francia, Enrico Gallinucci, Matteo Golfarelli
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
[DOLAP2020] Towards Conversational OLAP
1. Towards Conversational OLAP
Matteo Francia1
, Enrico Gallinucci1
, Matteo Golfarelli1
m.francia@unibo.it
1
University of Bologna
DOLAP2020
2. Data access democratization
Smart assistants are in companies’ agendas [1, 2]
Goal: perform conversational OLAP sessions
Existing OLAP interfaces: point-and-click metaphor to avoid SQL
Translate NL into Generalized Projection, Selection and Join (GPSJ) query [3]
Differences with state-of-art approaches [4, 5, 6, 7]
1. End-to-end dialog-driven framework for OLAP sessions
2. Plug-and-play: no impact on DW
3. No mandatory external knowledge
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 2 / 18
3. Functional architecture
Speech
-to-Text
OLAP
operator
Full query Disambiguation
Execution &
Visualization
Automatic
KB feeding
KB
enrichment KB
DW
Raw
text
Annotated
parse forest
Parse
tree Results
Metadata & values
Synonyms
Log
Parse tree
Interpretation
Offline
Online
Synonyms
Ontology
SQL
generation
SQL
Sales by
Customer and
Month
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 3 / 18
4. Full query: Tokenization and mapping
Match raw text n-grams with known DW entities in KB
Product
Name
Type
Category
Family
Customer
C.City
C.Region
Gender
Store
S.City
S.Region
Date
Quarter
Month
YearStoreSales
StoreCost
UnitSales
Sales
M1 = avg, UnitSales, where, Product, New York, group by, Region
M2 = avg, UnitSales, where, Product, New York, group by, Regin
NL = “medium sales for product New York by the region”
T = medium, sales, for, product, New, York, by, region
average, UnitSales, where, Product, New York, group by, Region
Regin
KB
Build mappings (i.e., combinations of entities)
M1 = avg, UnitSales, where, Product, New York, group by, Region
M2 = avg, UnitSales, where, Product, New York, group by, Regin
…
…
avg,
group by
New York,
Product,
Regin
Region
UnitSales
Where
…
KB
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 4 / 18
5. Full query: Parsing I
Grammar-based translation (effective for narrow lexicon [8, 9])
Capture complex syntax structures (i.e., query clauses)
LL(1) [10] not-ambiguous grammar: one PTM per mapping M
GPSJ ::= MC GC SC | MC SC | MC GC | MC |...
MC ::= ( Agg Mea | Mea | Cnt Fct | ...)+
GC ::= Gby Attr +
SC ::= Whr SCO
SCO ::= SCA “or” SCO | SCA
SCA ::= SCN “and” SCA | SCN
SCN ::= “not” SSC | SSC
SSC ::= Attr Cop Val | Val Attr | Val | ...
Category Entity Synonym samples
Int select return, show, get
Whr where in, such that
Gby group by by, for each, per
Cop =, <>, >, <, ≥, ≤ equal to, greater than
Agg sum, avg total, medium
Cnt count, count distinct number, amount
Fct Facts Domain specific
Mea Measures Domain specific
Att Attributes Domain specific
Val Categorical values Domain specific
Dates and numbers -
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 5 / 18
6. Full query: Parsing II
M1 = avg, UnitSales, where, Product, New York, group by, Region
Mea Agg Whr
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val Attr Gby Attr
GC
M2 = avg, UnitSales, where, Product, Ne
Mea Agg Whr
MC SC
GPSJ
SCO
SCA
SCN
SSC
Attr
Fully parsed: PTM includes all entities as leaves
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 6 / 18
7. Full query: Parsing III
e, Product, New York, group by, Region
SC
GPSJ
SCO
SCA
SCN
SSC
Val Attr Gby Attr
GC
M2 = avg, UnitSales, where, Product, New York, group by, Regin
Mea Agg Whr Gby
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val
SCO
SCA
SCN
SSC
Val
SC
Attr
PTM
Partially parsed: some entities are not included in PTM (parse forest PFM)
..., group by, Regin : cannot group by on a value
If fully parsed PFM = PTM
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 7 / 18
8. Full query: Checking & Enhancement
Parsing verifies syntax adherence to grammar, more issues arise
Tag problematic subtrees in PFM with annotations
Score(PFM)
M = avg, UnitSales, where, Product, =, New York, group by, Regin
Mea Agg Whr Gby
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val
SCO
SCA
SCN
SSC
Val
SC
Cop Attr
AVM
unparsed
Score(M)
M2 = avg, UnitSales, where, Product, New York, group by, Regin
Mea Agg Whr Gby
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val
SCO
SCA
SCN
SSC
Val
SC
Attr
AVM
unparsed
Annotation type Gen. derivation sample
Ambiguous Attribute SSC ::= Val
Ambiguous Agg. Operator MC ::= Mea
Attribute-Value Mismatch SSC ::= Attr Cop Val
MD-Meas Violation MC ::= Agg Mea
MD-GBY Violation GC ::= Gby Attr +
Unparsed clause –
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 8 / 18
9. Disambiguation
Handle annotations
Add implicit information to PFM
Ask question for each annotation to reduce PFM to PTM
SQL generation: translate PTM into SQL code
M = avg, UnitSales, where, Product, =, New York, group by, Regin
Mea Agg Whr Gby
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val
SCO
SCA
SCN
SSC
Val
SC
Cop Attr
AVM
unparsed
New York is not a valid
Product, possible
Products are…
Dangling clause, do
you want to add it or
drop it?
Annotation type Description
Ambiguous Attribute Val is member of these attributes [...]
Ambiguous Agg. Operator Mea allows these operators [...]
Attribute-Value Mismatch Attr and Val domains mismatch, values are [...]
MD-Meas Violation Mea does not allow Agg , operators are [...]
MD-GBY Violation It is not allowed to group by on Attr without Attr
Unparsed GC clause There is a dangling grouping clause GC
Unparsed MC clause There is a dangling measure clause MC
Unparsed SC clause There is a dangling predicate clause SC
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 9 / 18
10. Robustness vs Complexity I
Multiple mappings ensures robustness...
M1 = avg, UnitSales, where, Product, New York, group by, Regin
M2 = avg, UnitSales, where, Product, New York, group by, Region
…
NL = “medium sales for product New York by the regin”
T = medium, sales, for, product, New, York, by, regin
average, UnitSales, where, Product, New York, group by, Regin
Region
KB
More mappings, more interpretations
User experience: return only most promising query
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 10 / 18
11. Robustness vs Complexity II
Optimistic-pessimistic score function
Score(M) =
|M|
i=1 Sim(T , Ei)
Score(PFM) = Score(M ) where M is sub-sequence of M belonging to PTM
Score(PFM)
M = avg, UnitSales, where, Product, =, New York, group by, Region
Mea Agg Whr
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val Cop Attr Gby Attr
GC
AVM
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 11 / 18
12. Robustness vs Complexity III
Optimistic: annotations in PTM are likely to be solved
Score(PFM)
M = avg, UnitSales, where, Product, =, New York, group by, Region
Mea Agg Whr
SSC
Val Cop Attr Gby Attr
AVM
Score(PFM)
M = avg, UnitSales, where, Product, =, New York, group by, Region
Mea Agg Whr
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val Cop Attr Gby Attr
GC
AVM
M = avg, UnitSales
Mea Agg
MC
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 12 / 18
13. Robustness vs Complexity IV
Pessimistic: unparsed clauses are likely to be dropped
roup by, Region
Gby Attr
group by, Region
Gby Attr
GC
Score(PFM)
M = avg, UnitSales, where, Product, =, New York, group by, Regin
Mea Agg Whr Gby
MC SC
GPSJ
SCO
SCA
SCN
SSC
Val
SCO
SCA
SCN
SSC
Val
SC
Cop Attr
AVM
unparsed
Ranking by Score(PFM) allows pruning of parsed mappings
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 13 / 18
14. Evaluation I
Dataset
Real-word analytics queries [11] mapped to Foodmart schema
75% of queries are valid GPSJ queries
110 manually annotated queries
Automatic feeding: 1 fact, 39 attributes, 12 500 entities
Manual feeding: only 50 synonyms ("for each" synonym of group by)
Parameters
n-grams, n ∈ [1..4]
... mapped to top N entities with similarity ≥ α
Consider mappings covering at least 70% of T
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 14 / 18
15. Evaluation II
Accuracy: tree sim. TSim(PT, PT∗) [12] btw produced PTM and correct PT∗
M
1 2 3 4 5
k
0.0
0.2
0.4
0.6
0.8
1.0
TSim
N=2 N=4 N=6
(a) Varying N, k queries returned (α = 0.4)
1 2 3 4 5
k
0.0
0.2
0.4
0.6
0.8
1.0
TSim
=0.6 =0.5 =0.4
(b) Varying α, k queries returned (N = 6)
Accuracy depends on vocabulary (i.e., matched entities)
Proposing one query slightly impacts accuracy
Accuracy in [0.85, 0.9]
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 15 / 18
16. Evaluation III
0 1 2 3
Disambiguation step
0.0
0.2
0.4
0.6
0.8
1.0
TSim
(a) Disambiguation steps (k = 1, N = 6 and α = 0.4)
Disambiguation increases accuracy up to 0.94
State-of-art accuracy [4, 5, 6]
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 16 / 18
17. Conclusion & further enhancements
So far
Provided architecture for conversational OLAP with desiderata
Automated and portable: no impact on DW
Plug and play: no heavy manual lexicon definition
Robustness: adapt to spoken and syntactic inaccuracies
Translated NL to well-formed GPSJ query
What’s next?
1. Support a conversational OLAP session
Extend grammar with dialog primitives (i.e., OLAP operators)
Manage and refine previous parse trees
2. Learn frequent disambiguations to minimize user interaction
3. Design metaphor to support interaction
Visual metaphor based on DFM
4. Test with real users to verify perceived effectiveness
Usability, immediacy, memorability
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 17 / 18
19. References I
[1] Ramanathan V. Guha, Vineet Gupta, Vivek Raghunathan, and Ramakrishnan Srikant.
User modeling for a personal assistant.
In WSDM, pages 275–284. ACM, 2015.
[2] Hype cycle for artificial intelligence, 2018.
http://www.gartner.com/en/documents/3883863/hype-cycle-for-artificial-intelligence-2018.
Accessed: 2019-06-21.
[3] Ashish Gupta, Venky Harinarayan, and Dallan Quass.
Aggregate-query processing in data warehousing environments.
In VLDB, pages 358–369. Morgan Kaufmann, 1995.
[4] Fei Li and H. V. Jagadish.
Understanding natural language queries over relational databases.
SIGMOD Record, 45(1):6–13, 2016.
[5] Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig.
Sqlizer: query synthesis from natural language.
PACMPL, 1(OOPSLA):63:1–63:26, 2017.
[6] Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, and Fatma Özcan.
ATHENA: an ontology-driven system for natural language querying over relational data stores.
PVLDB, 9(12):1209–1220, 2016.
[7] Nicolas Kuchmann-Beauger, Falk Brauer, and Marie-Aude Aufaure.
QUASL: A framework for question answering and its application to business intelligence.
In RCIS, pages 1–12. IEEE, 2013.
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 1 / 2
20. References II
[8] Kedar Dhamdhere, Kevin S. McCurley, Ralfi Nahmias, Mukund Sundararajan, and Qiqi Yan.
Analyza: Exploring data with conversation.
In IUI, pages 493–504. ACM, 2017.
[9] Katrin Affolter, Kurt Stockinger, and Abraham Bernstein.
A comparative survey of recent natural language interfaces for databases.
The VLDB Journal, 28(5):793–819, 2019.
[10] John C. Beatty.
On the relationship between LL(1) and LR(1) grammars.
J. ACM, 29(4):1007–1022, 1982.
[11] Krista Drushku, Julien Aligon, Nicolas Labroche, Patrick Marcel, and Verónika Peralta.
Interest-based recommendations for business intelligence users.
Inf. Syst., 86:79–93, 2019.
[12] Kaizhong Zhang and Dennis E. Shasha.
Simple fast algorithms for the editing distance between trees and related problems.
SIAM J. Comput., 18(6):1245–1262, 1989.
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 2 / 2