SlideShare a Scribd company logo
1 of 20
Download to read offline
Towards Conversational OLAP
Matteo Francia1
, Enrico Gallinucci1
, Matteo Golfarelli1
m.francia@unibo.it
1
University of Bologna
DOLAP2020
Data access democratization
Smart assistants are in companies’ agendas [1, 2]
Goal: perform conversational OLAP sessions
Existing OLAP interfaces: point-and-click metaphor to avoid SQL
Translate NL into Generalized Projection, Selection and Join (GPSJ) query [3]
Differences with state-of-art approaches [4, 5, 6, 7]
1. End-to-end dialog-driven framework for OLAP sessions
2. Plug-and-play: no impact on DW
3. No mandatory external knowledge
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 2 / 18
Functional architecture
Speech
-to-Text
OLAP
operator
Full query Disambiguation
Execution &
Visualization
Automatic
KB feeding
KB
enrichment KB
DW
Raw
text
Annotated
parse forest
Parse
tree Results
Metadata & values
Synonyms
Log
Parse tree
Interpretation
Offline
Online
Synonyms
Ontology
SQL
generation
SQL
Sales by
Customer and
Month
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 3 / 18
Full query: Tokenization and mapping
Match raw text n-grams with known DW entities in KB
Product
Name
Type
Category
Family
Customer
C.City
C.Region
Gender
Store
S.City
S.Region
Date
Quarter
Month
YearStoreSales
StoreCost
UnitSales
Sales
M1 =  avg, UnitSales, where, Product, New York, group by, Region 
M2 =  avg, UnitSales, where, Product, New York, group by, Regin 
NL = “medium sales for product New York by the region”
T =  medium, sales, for, product, New, York, by, region 
average, UnitSales, where, Product, New York, group by, Region
Regin
KB
Build mappings (i.e., combinations of entities)
M1 =  avg, UnitSales, where, Product, New York, group by, Region 
M2 =  avg, UnitSales, where, Product, New York, group by, Regin 
…
…
avg,
group by
New York,
Product,
Regin
Region
UnitSales
Where
…
KB
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 4 / 18
Full query: Parsing I
Grammar-based translation (effective for narrow lexicon [8, 9])
Capture complex syntax structures (i.e., query clauses)
LL(1) [10] not-ambiguous grammar: one PTM per mapping M
GPSJ ::= MC GC SC | MC SC | MC GC | MC |...
MC ::= ( Agg Mea | Mea | Cnt Fct | ...)+
GC ::= Gby Attr +
SC ::= Whr SCO
SCO ::= SCA “or” SCO | SCA
SCA ::= SCN “and” SCA | SCN
SCN ::= “not” SSC | SSC
SSC ::= Attr Cop Val | Val Attr | Val | ...
Category Entity Synonym samples
Int select return, show, get
Whr where in, such that
Gby group by by, for each, per
Cop =, <>, >, <, ≥, ≤ equal to, greater than
Agg sum, avg total, medium
Cnt count, count distinct number, amount
Fct Facts Domain specific
Mea Measures Domain specific
Att Attributes Domain specific
Val Categorical values Domain specific
Dates and numbers -
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 5 / 18
Full query: Parsing II
M1 =  avg, UnitSales, where, Product, New York, group by, Region 
 Mea  Agg   Whr 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val  Attr   Gby   Attr 
 GC 
M2 =  avg, UnitSales, where, Product, Ne
 Mea  Agg   Whr 
 MC   SC 
 GPSJ 
 SCO
 SCA
 SCN
 SSC
 Attr 
Fully parsed: PTM includes all entities as leaves
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 6 / 18
Full query: Parsing III
e, Product, New York, group by, Region 
 SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val  Attr   Gby   Attr 
 GC 
M2 =  avg, UnitSales, where, Product, New York, group by, Regin 
 Mea  Agg   Whr   Gby 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SC 
 Attr 
PTM
Partially parsed: some entities are not included in PTM (parse forest PFM)
..., group by, Regin : cannot group by on a value
If fully parsed PFM = PTM
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 7 / 18
Full query: Checking & Enhancement
Parsing verifies syntax adherence to grammar, more issues arise
Tag problematic subtrees in PFM with annotations
Score(PFM)
M =  avg, UnitSales, where, Product, =, New York, group by, Regin 
 Mea  Agg   Whr   Gby 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SC 
 Cop  Attr 
AVM
unparsed
Score(M)
M2 =  avg, UnitSales, where, Product, New York, group by, Regin 
 Mea  Agg   Whr   Gby 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SC 
 Attr 
AVM
unparsed
Annotation type Gen. derivation sample
Ambiguous Attribute SSC ::= Val
Ambiguous Agg. Operator MC ::= Mea
Attribute-Value Mismatch SSC ::= Attr Cop Val
MD-Meas Violation MC ::= Agg Mea
MD-GBY Violation GC ::= Gby Attr +
Unparsed clause –
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 8 / 18
Disambiguation
Handle annotations
Add implicit information to PFM
Ask question for each annotation to reduce PFM to PTM
SQL generation: translate PTM into SQL code
M =  avg, UnitSales, where, Product, =, New York, group by, Regin 
 Mea  Agg   Whr   Gby 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SC 
 Cop  Attr 
AVM
unparsed
New York is not a valid
Product, possible
Products are…
Dangling clause, do
you want to add it or
drop it?
Annotation type Description
Ambiguous Attribute Val is member of these attributes [...]
Ambiguous Agg. Operator Mea allows these operators [...]
Attribute-Value Mismatch Attr and Val domains mismatch, values are [...]
MD-Meas Violation Mea does not allow Agg , operators are [...]
MD-GBY Violation It is not allowed to group by on Attr without Attr
Unparsed GC clause There is a dangling grouping clause GC
Unparsed MC clause There is a dangling measure clause MC
Unparsed SC clause There is a dangling predicate clause SC
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 9 / 18
Robustness vs Complexity I
Multiple mappings ensures robustness...
M1 =  avg, UnitSales, where, Product, New York, group by, Regin 
M2 =  avg, UnitSales, where, Product, New York, group by, Region 
…
NL = “medium sales for product New York by the regin”
T =  medium, sales, for, product, New, York, by, regin 
average, UnitSales, where, Product, New York, group by, Regin
Region
KB
More mappings, more interpretations
User experience: return only most promising query
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 10 / 18
Robustness vs Complexity II
Optimistic-pessimistic score function
Score(M) =
|M|
i=1 Sim(T , Ei)
Score(PFM) = Score(M ) where M is sub-sequence of M belonging to PTM
Score(PFM)
M =  avg, UnitSales, where, Product, =, New York, group by, Region 
 Mea  Agg   Whr 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val  Cop  Attr   Gby   Attr 
 GC 
AVM
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 11 / 18
Robustness vs Complexity III
Optimistic: annotations in PTM are likely to be solved
Score(PFM)
M =  avg, UnitSales, where, Product, =, New York, group by, Region 
 Mea  Agg   Whr 
 SSC 
 Val  Cop  Attr   Gby   Attr 
AVM
Score(PFM)
M =  avg, UnitSales, where, Product, =, New York, group by, Region 
 Mea  Agg   Whr 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val  Cop  Attr   Gby   Attr 
 GC 
AVM
M =  avg, UnitSales
 Mea  Agg 
 MC 

Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 12 / 18
Robustness vs Complexity IV
Pessimistic: unparsed clauses are likely to be dropped
roup by, Region 
Gby   Attr 
group by, Region 
 Gby   Attr 
 GC 
Score(PFM)
M =  avg, UnitSales, where, Product, =, New York, group by, Regin 
 Mea  Agg   Whr   Gby 
 MC   SC 
 GPSJ 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SCO 
 SCA 
 SCN 
 SSC 
 Val 
 SC 
 Cop  Attr 
AVM
unparsed
Ranking by Score(PFM) allows pruning of parsed mappings
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 13 / 18
Evaluation I
Dataset
Real-word analytics queries [11] mapped to Foodmart schema
75% of queries are valid GPSJ queries
110 manually annotated queries
Automatic feeding: 1 fact, 39 attributes, 12 500 entities
Manual feeding: only 50 synonyms ("for each" synonym of group by)
Parameters
n-grams, n ∈ [1..4]
... mapped to top N entities with similarity ≥ α
Consider mappings covering at least 70% of T
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 14 / 18
Evaluation II
Accuracy: tree sim. TSim(PT, PT∗) [12] btw produced PTM and correct PT∗
M
1 2 3 4 5
k
0.0
0.2
0.4
0.6
0.8
1.0
TSim
N=2 N=4 N=6
(a) Varying N, k queries returned (α = 0.4)
1 2 3 4 5
k
0.0
0.2
0.4
0.6
0.8
1.0
TSim
=0.6 =0.5 =0.4
(b) Varying α, k queries returned (N = 6)
Accuracy depends on vocabulary (i.e., matched entities)
Proposing one query slightly impacts accuracy
Accuracy in [0.85, 0.9]
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 15 / 18
Evaluation III
0 1 2 3
Disambiguation step
0.0
0.2
0.4
0.6
0.8
1.0
TSim
(a) Disambiguation steps (k = 1, N = 6 and α = 0.4)
Disambiguation increases accuracy up to 0.94
State-of-art accuracy [4, 5, 6]
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 16 / 18
Conclusion & further enhancements
So far
Provided architecture for conversational OLAP with desiderata
Automated and portable: no impact on DW
Plug and play: no heavy manual lexicon definition
Robustness: adapt to spoken and syntactic inaccuracies
Translated NL to well-formed GPSJ query
What’s next?
1. Support a conversational OLAP session
Extend grammar with dialog primitives (i.e., OLAP operators)
Manage and refine previous parse trees
2. Learn frequent disambiguations to minimize user interaction
3. Design metaphor to support interaction
Visual metaphor based on DFM
4. Test with real users to verify perceived effectiveness
Usability, immediacy, memorability
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 17 / 18
End
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 18 / 18
References I
[1] Ramanathan V. Guha, Vineet Gupta, Vivek Raghunathan, and Ramakrishnan Srikant.
User modeling for a personal assistant.
In WSDM, pages 275–284. ACM, 2015.
[2] Hype cycle for artificial intelligence, 2018.
http://www.gartner.com/en/documents/3883863/hype-cycle-for-artificial-intelligence-2018.
Accessed: 2019-06-21.
[3] Ashish Gupta, Venky Harinarayan, and Dallan Quass.
Aggregate-query processing in data warehousing environments.
In VLDB, pages 358–369. Morgan Kaufmann, 1995.
[4] Fei Li and H. V. Jagadish.
Understanding natural language queries over relational databases.
SIGMOD Record, 45(1):6–13, 2016.
[5] Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig.
Sqlizer: query synthesis from natural language.
PACMPL, 1(OOPSLA):63:1–63:26, 2017.
[6] Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, and Fatma Özcan.
ATHENA: an ontology-driven system for natural language querying over relational data stores.
PVLDB, 9(12):1209–1220, 2016.
[7] Nicolas Kuchmann-Beauger, Falk Brauer, and Marie-Aude Aufaure.
QUASL: A framework for question answering and its application to business intelligence.
In RCIS, pages 1–12. IEEE, 2013.
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 1 / 2
References II
[8] Kedar Dhamdhere, Kevin S. McCurley, Ralfi Nahmias, Mukund Sundararajan, and Qiqi Yan.
Analyza: Exploring data with conversation.
In IUI, pages 493–504. ACM, 2017.
[9] Katrin Affolter, Kurt Stockinger, and Abraham Bernstein.
A comparative survey of recent natural language interfaces for databases.
The VLDB Journal, 28(5):793–819, 2019.
[10] John C. Beatty.
On the relationship between LL(1) and LR(1) grammars.
J. ACM, 29(4):1007–1022, 1982.
[11] Krista Drushku, Julien Aligon, Nicolas Labroche, Patrick Marcel, and Verónika Peralta.
Interest-based recommendations for business intelligence users.
Inf. Syst., 86:79–93, 2019.
[12] Kaizhong Zhang and Dennis E. Shasha.
Simple fast algorithms for the editing distance between trees and related problems.
SIAM J. Comput., 18(6):1245–1262, 1989.
Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 2 / 2

More Related Content

Similar to [DOLAP2020] Towards Conversational OLAP

LIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover LetterLIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover Letter
Tsen Yung Liao
 
Designing C++ portable SIMD support
Designing C++ portable SIMD supportDesigning C++ portable SIMD support
Designing C++ portable SIMD support
Joel Falcou
 

Similar to [DOLAP2020] Towards Conversational OLAP (20)

ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path
ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary pathISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path
ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path
 
LIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover LetterLIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover Letter
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
Spatial Clustering to Uncluttering Map Visualization in SOLAP
Spatial Clustering to Uncluttering Map Visualization in SOLAPSpatial Clustering to Uncluttering Map Visualization in SOLAP
Spatial Clustering to Uncluttering Map Visualization in SOLAP
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
 
All projects
All projectsAll projects
All projects
 
Lean six sigma executive overview (case study) templates
Lean six sigma executive overview (case study) templatesLean six sigma executive overview (case study) templates
Lean six sigma executive overview (case study) templates
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
Hierarchical free monads and software design in fp
Hierarchical free monads and software design in fpHierarchical free monads and software design in fp
Hierarchical free monads and software design in fp
 
Designing C++ portable SIMD support
Designing C++ portable SIMD supportDesigning C++ portable SIMD support
Designing C++ portable SIMD support
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 

More from University of Bologna

Data models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analyticsData models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analytics
University of Bologna
 

More from University of Bologna (8)

Data models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analyticsData models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analytics
 
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
 
[DataPlat2023] Opening
[DataPlat2023] Opening[DataPlat2023] Opening
[DataPlat2023] Opening
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
 
[ADBIS2022] Insight-based Vocalization of OLAP Sessions
[ADBIS2022] Insight-based Vocalization of OLAP Sessions[ADBIS2022] Insight-based Vocalization of OLAP Sessions
[ADBIS2022] Insight-based Vocalization of OLAP Sessions
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence
 

Recently uploaded

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 

[DOLAP2020] Towards Conversational OLAP

  • 1. Towards Conversational OLAP Matteo Francia1 , Enrico Gallinucci1 , Matteo Golfarelli1 m.francia@unibo.it 1 University of Bologna DOLAP2020
  • 2. Data access democratization Smart assistants are in companies’ agendas [1, 2] Goal: perform conversational OLAP sessions Existing OLAP interfaces: point-and-click metaphor to avoid SQL Translate NL into Generalized Projection, Selection and Join (GPSJ) query [3] Differences with state-of-art approaches [4, 5, 6, 7] 1. End-to-end dialog-driven framework for OLAP sessions 2. Plug-and-play: no impact on DW 3. No mandatory external knowledge Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 2 / 18
  • 3. Functional architecture Speech -to-Text OLAP operator Full query Disambiguation Execution & Visualization Automatic KB feeding KB enrichment KB DW Raw text Annotated parse forest Parse tree Results Metadata & values Synonyms Log Parse tree Interpretation Offline Online Synonyms Ontology SQL generation SQL Sales by Customer and Month Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 3 / 18
  • 4. Full query: Tokenization and mapping Match raw text n-grams with known DW entities in KB Product Name Type Category Family Customer C.City C.Region Gender Store S.City S.Region Date Quarter Month YearStoreSales StoreCost UnitSales Sales M1 =  avg, UnitSales, where, Product, New York, group by, Region  M2 =  avg, UnitSales, where, Product, New York, group by, Regin  NL = “medium sales for product New York by the region” T =  medium, sales, for, product, New, York, by, region  average, UnitSales, where, Product, New York, group by, Region Regin KB Build mappings (i.e., combinations of entities) M1 =  avg, UnitSales, where, Product, New York, group by, Region  M2 =  avg, UnitSales, where, Product, New York, group by, Regin  … … avg, group by New York, Product, Regin Region UnitSales Where … KB Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 4 / 18
  • 5. Full query: Parsing I Grammar-based translation (effective for narrow lexicon [8, 9]) Capture complex syntax structures (i.e., query clauses) LL(1) [10] not-ambiguous grammar: one PTM per mapping M GPSJ ::= MC GC SC | MC SC | MC GC | MC |... MC ::= ( Agg Mea | Mea | Cnt Fct | ...)+ GC ::= Gby Attr + SC ::= Whr SCO SCO ::= SCA “or” SCO | SCA SCA ::= SCN “and” SCA | SCN SCN ::= “not” SSC | SSC SSC ::= Attr Cop Val | Val Attr | Val | ... Category Entity Synonym samples Int select return, show, get Whr where in, such that Gby group by by, for each, per Cop =, <>, >, <, ≥, ≤ equal to, greater than Agg sum, avg total, medium Cnt count, count distinct number, amount Fct Facts Domain specific Mea Measures Domain specific Att Attributes Domain specific Val Categorical values Domain specific Dates and numbers - Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 5 / 18
  • 6. Full query: Parsing II M1 =  avg, UnitSales, where, Product, New York, group by, Region   Mea  Agg   Whr   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val  Attr   Gby   Attr   GC  M2 =  avg, UnitSales, where, Product, Ne  Mea  Agg   Whr   MC   SC   GPSJ   SCO  SCA  SCN  SSC  Attr  Fully parsed: PTM includes all entities as leaves Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 6 / 18
  • 7. Full query: Parsing III e, Product, New York, group by, Region   SC   GPSJ   SCO   SCA   SCN   SSC   Val  Attr   Gby   Attr   GC  M2 =  avg, UnitSales, where, Product, New York, group by, Regin   Mea  Agg   Whr   Gby   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val   SCO   SCA   SCN   SSC   Val   SC   Attr  PTM Partially parsed: some entities are not included in PTM (parse forest PFM) ..., group by, Regin : cannot group by on a value If fully parsed PFM = PTM Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 7 / 18
  • 8. Full query: Checking & Enhancement Parsing verifies syntax adherence to grammar, more issues arise Tag problematic subtrees in PFM with annotations Score(PFM) M =  avg, UnitSales, where, Product, =, New York, group by, Regin   Mea  Agg   Whr   Gby   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val   SCO   SCA   SCN   SSC   Val   SC   Cop  Attr  AVM unparsed Score(M) M2 =  avg, UnitSales, where, Product, New York, group by, Regin   Mea  Agg   Whr   Gby   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val   SCO   SCA   SCN   SSC   Val   SC   Attr  AVM unparsed Annotation type Gen. derivation sample Ambiguous Attribute SSC ::= Val Ambiguous Agg. Operator MC ::= Mea Attribute-Value Mismatch SSC ::= Attr Cop Val MD-Meas Violation MC ::= Agg Mea MD-GBY Violation GC ::= Gby Attr + Unparsed clause – Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 8 / 18
  • 9. Disambiguation Handle annotations Add implicit information to PFM Ask question for each annotation to reduce PFM to PTM SQL generation: translate PTM into SQL code M =  avg, UnitSales, where, Product, =, New York, group by, Regin   Mea  Agg   Whr   Gby   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val   SCO   SCA   SCN   SSC   Val   SC   Cop  Attr  AVM unparsed New York is not a valid Product, possible Products are… Dangling clause, do you want to add it or drop it? Annotation type Description Ambiguous Attribute Val is member of these attributes [...] Ambiguous Agg. Operator Mea allows these operators [...] Attribute-Value Mismatch Attr and Val domains mismatch, values are [...] MD-Meas Violation Mea does not allow Agg , operators are [...] MD-GBY Violation It is not allowed to group by on Attr without Attr Unparsed GC clause There is a dangling grouping clause GC Unparsed MC clause There is a dangling measure clause MC Unparsed SC clause There is a dangling predicate clause SC Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 9 / 18
  • 10. Robustness vs Complexity I Multiple mappings ensures robustness... M1 =  avg, UnitSales, where, Product, New York, group by, Regin  M2 =  avg, UnitSales, where, Product, New York, group by, Region  … NL = “medium sales for product New York by the regin” T =  medium, sales, for, product, New, York, by, regin  average, UnitSales, where, Product, New York, group by, Regin Region KB More mappings, more interpretations User experience: return only most promising query Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 10 / 18
  • 11. Robustness vs Complexity II Optimistic-pessimistic score function Score(M) = |M| i=1 Sim(T , Ei) Score(PFM) = Score(M ) where M is sub-sequence of M belonging to PTM Score(PFM) M =  avg, UnitSales, where, Product, =, New York, group by, Region   Mea  Agg   Whr   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val  Cop  Attr   Gby   Attr   GC  AVM Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 11 / 18
  • 12. Robustness vs Complexity III Optimistic: annotations in PTM are likely to be solved Score(PFM) M =  avg, UnitSales, where, Product, =, New York, group by, Region   Mea  Agg   Whr   SSC   Val  Cop  Attr   Gby   Attr  AVM Score(PFM) M =  avg, UnitSales, where, Product, =, New York, group by, Region   Mea  Agg   Whr   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val  Cop  Attr   Gby   Attr   GC  AVM M =  avg, UnitSales  Mea  Agg   MC   Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 12 / 18
  • 13. Robustness vs Complexity IV Pessimistic: unparsed clauses are likely to be dropped roup by, Region  Gby   Attr  group by, Region   Gby   Attr   GC  Score(PFM) M =  avg, UnitSales, where, Product, =, New York, group by, Regin   Mea  Agg   Whr   Gby   MC   SC   GPSJ   SCO   SCA   SCN   SSC   Val   SCO   SCA   SCN   SSC   Val   SC   Cop  Attr  AVM unparsed Ranking by Score(PFM) allows pruning of parsed mappings Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 13 / 18
  • 14. Evaluation I Dataset Real-word analytics queries [11] mapped to Foodmart schema 75% of queries are valid GPSJ queries 110 manually annotated queries Automatic feeding: 1 fact, 39 attributes, 12 500 entities Manual feeding: only 50 synonyms ("for each" synonym of group by) Parameters n-grams, n ∈ [1..4] ... mapped to top N entities with similarity ≥ α Consider mappings covering at least 70% of T Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 14 / 18
  • 15. Evaluation II Accuracy: tree sim. TSim(PT, PT∗) [12] btw produced PTM and correct PT∗ M 1 2 3 4 5 k 0.0 0.2 0.4 0.6 0.8 1.0 TSim N=2 N=4 N=6 (a) Varying N, k queries returned (α = 0.4) 1 2 3 4 5 k 0.0 0.2 0.4 0.6 0.8 1.0 TSim =0.6 =0.5 =0.4 (b) Varying α, k queries returned (N = 6) Accuracy depends on vocabulary (i.e., matched entities) Proposing one query slightly impacts accuracy Accuracy in [0.85, 0.9] Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 15 / 18
  • 16. Evaluation III 0 1 2 3 Disambiguation step 0.0 0.2 0.4 0.6 0.8 1.0 TSim (a) Disambiguation steps (k = 1, N = 6 and α = 0.4) Disambiguation increases accuracy up to 0.94 State-of-art accuracy [4, 5, 6] Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 16 / 18
  • 17. Conclusion & further enhancements So far Provided architecture for conversational OLAP with desiderata Automated and portable: no impact on DW Plug and play: no heavy manual lexicon definition Robustness: adapt to spoken and syntactic inaccuracies Translated NL to well-formed GPSJ query What’s next? 1. Support a conversational OLAP session Extend grammar with dialog primitives (i.e., OLAP operators) Manage and refine previous parse trees 2. Learn frequent disambiguations to minimize user interaction 3. Design metaphor to support interaction Visual metaphor based on DFM 4. Test with real users to verify perceived effectiveness Usability, immediacy, memorability Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 17 / 18
  • 18. End Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 18 / 18
  • 19. References I [1] Ramanathan V. Guha, Vineet Gupta, Vivek Raghunathan, and Ramakrishnan Srikant. User modeling for a personal assistant. In WSDM, pages 275–284. ACM, 2015. [2] Hype cycle for artificial intelligence, 2018. http://www.gartner.com/en/documents/3883863/hype-cycle-for-artificial-intelligence-2018. Accessed: 2019-06-21. [3] Ashish Gupta, Venky Harinarayan, and Dallan Quass. Aggregate-query processing in data warehousing environments. In VLDB, pages 358–369. Morgan Kaufmann, 1995. [4] Fei Li and H. V. Jagadish. Understanding natural language queries over relational databases. SIGMOD Record, 45(1):6–13, 2016. [5] Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. Sqlizer: query synthesis from natural language. PACMPL, 1(OOPSLA):63:1–63:26, 2017. [6] Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, and Fatma Özcan. ATHENA: an ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209–1220, 2016. [7] Nicolas Kuchmann-Beauger, Falk Brauer, and Marie-Aude Aufaure. QUASL: A framework for question answering and its application to business intelligence. In RCIS, pages 1–12. IEEE, 2013. Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 1 / 2
  • 20. References II [8] Kedar Dhamdhere, Kevin S. McCurley, Ralfi Nahmias, Mukund Sundararajan, and Qiqi Yan. Analyza: Exploring data with conversation. In IUI, pages 493–504. ACM, 2017. [9] Katrin Affolter, Kurt Stockinger, and Abraham Bernstein. A comparative survey of recent natural language interfaces for databases. The VLDB Journal, 28(5):793–819, 2019. [10] John C. Beatty. On the relationship between LL(1) and LR(1) grammars. J. ACM, 29(4):1007–1022, 1982. [11] Krista Drushku, Julien Aligon, Nicolas Labroche, Patrick Marcel, and Verónika Peralta. Interest-based recommendations for business intelligence users. Inf. Syst., 86:79–93, 2019. [12] Kaizhong Zhang and Dennis E. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245–1262, 1989. Matteo Francia (UniBO) DOLAP: Towards Conversational OLAP 2 / 2