[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data and advanced analytics

PhD Computer Science and Engineering
Information
World
Data
Knowledge
Wisdom
Augmenting the Knowledge Pyramid
with Unconventional Data & Advanced Analytics
Matteo Francia
Supervisor: Prof. Matteo Golfarelli
Ciclo XXXIII

Outline
The knowledge pyramid
Augmenting the knowledge pyramid
 Part I: Unconventional data
 Part II: Advanced analytics
Advanced analytics in hand-free scenarios
Conclusion
Matteo Francia – University of Bologna 2
Information
World
Data
Knowledge
Wisdom

BI & the knowledge pyramid
Business intelligence
 Strategies to transform raw data into decision-making insights
Transformation is usually abstracted in the “knowledge pyramid” [1, 2]
 Data: symbols representing real-word objects (e.g., store product sales)
 Information: processed data (e.g., query the product with highest profit)
 Knowledge: understanding (e.g., mine products often sold together)
 Wisdom: knowledge in action (e.g., discount products to optimize profits)
Contribution: augmenting the knowledge pyramid
 PART I: unconventional data to improve decision-making
 PART II: advanced analytics to climb the pyramid
[1] Jennifer E. Rowley: The wisdom hierarchy: representations of the DIKW hierarchy. J. Inf. Sci. 33(2): 163-180 (2007)
[2] Martin Frické: The knowledge pyramid: a critique of the DIKW hierarchy. J. Inf. Sci. 35(2): 131-142 (2009)
World
Data
(Operational DB, OLTP)
Information
(Data warehouse, OLAP)
Knowledge
(Data Mining)
Wisdom
(Decisions)

Part I: unconventional data
Sensing provides data to support contextual decisions
 “World” and “Data” levels
New challenges on unconventional data
 Unstructured and non-relational
 Transformation requires type-aware techniques
World
Knowledge
(Data Mining)
Data
Information
(Data Warehouse, OLAP)
Wisdom
(Decisions)
Unconventional data

Contribution: mobility data
Mobility data are at the core of location-based systems
 Trajectory: temporal sequence of spatial locations
- Uncertainty: positioning errors
- E.g., GPS (~m) vs GSM (~km)
- Sensitivity: 4 points can identify 95% individuals [1, 2]
- De-anonymize through raw signatures [3]
- De-anonymize through personal gazetteers [4]
 Big data applications
- Map matching [5]: project GPS locations to the most-likely road segments
- Profiling [6]: estimate user profiles and income by frequented places
- Precision farming [7]: monitor and coordinate cropping robots
[1] Yves-Alexandre De Montjoye, et al.: Unique in the crowd: The privacy bounds of human mobility. Scientific reports 3 (2013): 1376.
[2] Fengmei Jin, Wen Hua, Matteo Francia, Pingfu Chao, Maria E. Orlowska, Xiaofang Zhou: A Survey and Experimental Study on Privacy-Preserving Trajectory Data Publishing. (Under review, TKDE)
[3] Fengmei Jin, Wen Hua, Thomas Zhou, Jiajie Xu, Matteo Francia, Maria E. Orlowska, Xiaofang Zhou: Trajectory-Based Spatiotemporal Entity Linking. IEEE Trans. on Know. and Data Eng. (2020).
[4] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli, Nicola Santolini: DART: De-Anonymization of personal gazetteers through social trajectories. J. Inf. Secur. Appl. 55: 102634 (2020)
[5] Matteo Francia, Enrico Gallinucci, Federico Vitali: Map-Matching on Big Data: a Distributed and Efficient Algorithm with a Hidden Markov Model. MIPRO 2019: 1238-1243
[6] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: Summarization and visualization of multi-level and multi-dimensional itemsets. Inf. Sci. 520: 63-85 (2020)
[7] Giuliano Vitali, Matteo Francia, Matteo Golfarelli, Maurizio Canavari: Crop Management with the IoT: An Interdisciplinary Survey. Agronomy 11.1 (2021): 181.
A B C D
1
2
3
4
Tb
Tg
Tr

Part II: advanced analytics
High availability and accessibility attract new data scientists
 High competence in business domain
 Low competence in computer science
Since the ’70s, relational queries to retrieve data
 Comprehension of formal languages and DBMS
 Advanced analytics (semi-automatic transformation)
- “Information” and “Knowledge” levels
Advanced analytics
Intention
Hand-free scenarios
Data summaries
World
Knowledge
(Data Mining)
Data
Information
(Data Warehouse, OLAP)
Wisdom
(Decisions)

Contribution: advanced analytics
Hand-free scenarios
 Augmented OLAP [1]: recommendation in augmented reality
 Conversational OLAP [2, 3]: interpret natural language queries
Express high-level analytic abstractions, not queries
 E.g., describe [4, 5] interesting patterns of sales
 E.g., assess [6] Italian sales against French sales
Data summaries
 Summarization based on multidimensional similarity [7]
 Conceptual model for data narratives [8, 9]
[1] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+. A framework for Augmented Business Intelligence. Inf. Syst. 92: 101520 (2020)
[2] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A framework for conversational OLAP. Inf. Syst. 101752. (2021)
[3] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Conversational OLAP in Action. EDBT 2021: 646-649
[4] Antoine Chédin, Matteo Francia, Patrick Marcel, Veronika Peralta, and Stefano Rizzi. The tell-tale cube. ADBIS, 2020.
[5] Matteo Francia, Patrick Marcel, Verónika Peralta, Stefano Rizzi: Enhancing Cubes with Models to Describe Multidimensional Data. Information Systems Frontiers (2021)
[6] Matteo Francia, Matteo Golfarelli, Patrick Marcel, Stefano Rizzi, Panos Vassiliadis: Assess Queries for Interactive Analysis of Data Cubes. EDBT 2021: 121-132
[7] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: Summarization and visualization of multi-level and multi-dimensional itemsets. Inf. Sci. 520: 63-85 (2020)
[8] Faten El Outa, Matteo Francia, Patrick Marcel, Verónika Peralta, Panos Vassiliadis: Towards a Conceptual Model for Data Narratives. ER 2020: 261-270
[9] Faten El Outa, Matteo Francia, Patrick Marcel, Verónika Peralta, Panos Vassiliadis: Supporting the Generation of Data Narratives. ER Forum/Posters/Demos 2020: 168-172

Information
World
Data
Knowledge
Wisdom
Advanced analytics
Augmented OLAP
Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+: A framework for Augmented Business Intelligence. Inf. Syst. 92: 101520 (2020)
Matteo Francia, Matteo Golfarelli, Stefano Rizzi: Augmented Business Intelligence. DOLAP 2019

Application scope
Enable analytics on augmented reality
 E.g., an inspector analyzing production rates
Sense the context through augmented devices
 E.g., smart glasses
 Detect interaction and engagement [1]
Produce analytical reports
 Relevant to the sensed context
 Cardinality constraint
 Near real-time
?
Analytical Reports
[1] Yu-Chuan Su, Kristen Grauman: Detecting Engagement in Egocentric Video. ECCV (5) 2016: 454-471

Data Mart: repository of multidimensional cubes
 Cubes representing business facts
Data dictionary
 What we can recognize (i.e., md-elements)
 Context: subset of md-elements
Mappings to sets of md-elements
 A-priori interest
What can we sense?
Date
Year
Product
Type
Category
City
Sales
Quantity
Revenues
Assembly
AssembledItems
AssemblyTime
Part
Context
<Object, Seat> dist = 1m
<Object, BikeExcite> dist = 2m
<Location, RoomA.1>
<Date, 16/10/2018>
<Role, Controller>
Date
Month
Year
Product
Type
Category
Family
Month
Store
Device
Dictionary

Recommendation
Context interpretation
 Given context T over the data dictionary
 Project T to an image of fragments I through mappings
- Fragment: intuitively a “small” query
Add the log
 Get queries with positive feedback from similar contexts
- Enrich I to I* with unperceived elements from T
 Each fragment has contextual and log relevance
Query generation
 Cannot directly translate I* into a well-formed query
 High cardinality I* = hardly interpretable “monster query”
Query
generation
Context relevant queries
recommended
queries
Query selection
<Object, Seat> dist = 1m
<Object, BikeExcite> dist = 2m
<Location, RoomA.1>
<Date, 16/10/2018>
<Role, Controller>
Log
Analytical Reports
user’s
feedback

Query generation
Generate queries from image I* of fragments
 Each fragment is a query
 Depth-first exploration with pruning rules
- Query cardinality can only increase
- Some queries are redundant
I*
μ(T)
{Month},
{},
{AssembledItems}
{Product},
{(Product=BikeExcite)},
{Quantity}
{Part,Type},
{(Type=Bike)},
{}
{Part,Product},
{Quantity}
{Month,Product},
{Quantity,AssembledItems}
{Year},
{},
{AssembledItems}
{Year,Product},
{Month,Part,Product},
{Year,Part,Product},
{Month,Part,Product},
{Month,Product},
{Month,Part,Type},
{(Type=Bike)},
{AssembledItems}
{Year,Part,Type},
{(Type=Bike)},
{AssembledItems}
{Month,Part,Type},
{(Type=Bike)},
{AssembledItems}
{Month},
{},
{AssembledItems}
Fragments

Query selection
Given #queries (rq), maximize the covered fragments and minimize their overlapping
E.g., given two queries q and q’
rel(q) + rel(q’) – sim(q, q’) * (rel(q) + rel(q’)) / 2
 Weighted Maximum Coverage Problem (NP-hard)
 Greedy: iteratively pick query maximizing relT
- Only a few query are retrieved, not expensive
q
I*
μ(T)
q'

Test set up
 Cube with 109 md-elements
 Simulate user moving inside a factory
Given fixed context and query target
 Assess similarity of the proposed query in similar contexts
 𝛽: context similarity
 sim: proposed/target query similarity
Effectiveness
Best query (with user exp.)
After 2 visits: 0.95, 4 visits: 0.98
Best query (no user exp.)
|T| = 12, rq = 4
Target context Similar context

Research directions
OLAP in augmented reality
 Support analytical queries in hand-free scenarios
 Recommend relevant data facts from a real-world context
Research directions
 Provide (fast) query previews
- Estimate the execution time of each query
- Address query caching and multi-query optimization issues
 Correlate context-awareness to data quality [3]
- Relevance, amount, and completeness [4]
[3] Stephanie Watts, Ganesan Shankaranarayanan, Adir Even: Data quality assessment in context: A cognitive perspective. Decis. Support Syst. 48(1): 202-211 (2009)
[4] Diane M. Strong, Yang W. Lee, Richard Y. Wang: Data Quality in Context. Commun. ACM 40(5): 103-110 (1997)

Information
World
Data
Knowledge
Wisdom
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A framework for conversational OLAP. Inf. Syst. 101752. (2021)
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Conversational OLAP in Action. EDBT 2021 (best demo award): 646-649
Advanced analytics
Conversational OLAP

Motivation
Enable analytics through natural language
OLAP provides low-level operators [1]
 Users need to have knowledge on the multidimensional model…
 … or even programming skills
We introduce COOL (COnversational OLap) [3]
 Translate natural language into formal queries
[1] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's: An intentional analytics model to reinvent OLAP. Information Systems. (2019)
[2] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+: A framework for Augmented Business Intelligence. Information Systems. (2020)
[3] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)

COOL: architecture
Automatic
KB feeding
Manual KB
enrichment KB
DW
Metadata
& values
Synonyms
Offline
Online
Synonyms
Ontology

COOL: architecture
Speech-
to-Text
OLAP
operator
Full query
Disambiguation
& Enhancement
Execution &
Visualization
Automatic
KB feeding
Manual KB
enrichment
Raw
text
Annotated
parse forest
Parse
tree
Metadata
& values
Synonyms
Log
Interpretation
Offline
Online
Synonyms
Ontology
SQL
generation
SQL
Sales by
Customer and
Month
Parse tree
Statistics
KB
DW

COOL: interpretation
Mea
Agg “where” “group by” Attr
MC SC GC
GPSJ
SCA
SCN
SSC
Val
Region
M1 = avg, UnitSales, where, 2019, group by,
T = «return the average sales in 2019 per store region»
⟨GPSJ⟩ ::= ⟨MC⟩⟨GC⟩⟨SC⟩
⟨MC⟩ ::= (⟨Agg⟩⟨Mea⟩ | ⟨Cnt⟩⟨Fct⟩)+
⟨GC⟩ ::= “𝑔𝑟𝑜𝑢𝑝 𝑏𝑦” ⟨Attr⟩+
⟨SC⟩ ::= “𝑤ℎ𝑒𝑟𝑒” ⟨SCA⟩
⟨SCA⟩ ::= ⟨SCN⟩ “𝑎𝑛𝑑” ⟨SCA⟩ | ⟨SCN⟩
⟨SCN⟩ ::= “𝑛𝑜𝑡” ⟨SSC⟩ | ⟨SSC⟩
⟨SSC⟩ ::= ⟨Attr⟩⟨Cop⟩⟨Val⟩ | ⟨Attr⟩⟨Val⟩ | ⟨Val⟩
⟨Cop⟩ ::= “=” | “<>” | “>” | “<” | “≥” | “≤”
⟨Agg⟩ ::= “𝑠𝑢𝑚” | “𝑎𝑣𝑔” | “𝑚𝑖𝑛” | “𝑚𝑎𝑥”
⟨Cnt⟩ ::= “𝑐𝑜𝑢𝑛𝑡” | “𝑐𝑜𝑢𝑛𝑡 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡”
⟨Fct⟩ ::= Domain-specific facts
⟨Mea⟩ ::= Domain-specific measures
⟨Attr⟩ ::= Domain-specific attributes
⟨Val⟩ ::= Domain-specific values

Effectiveness
40 users with heterogeneous OLAP skills
 Asked to translate (Italian) analytic goals into English
 Users provided good feedback on the interface...
 ... as well as on the interpretation accuracy
Full Query OLAP operator
OLAP Familiarity Accuracy Time (s) Accuracy Time (s)
Low 0.91 141 0.86 102
High 0.91 97 0.92 71

COOL in Action!
[3] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Conversational OLAP in Action. EDBT (best demo award) 2021: 646-649

Research directions
COOL (Conversational OLAP)
 Support the translation of a natural language conversation into an OLAP session
 Analyze data without requiring technological skills
- Add conversational capabilities to Augmented OLAP
Towards an end-to-end conversational solution
 Create query summaries that can be returned as short vocal messages
 Identify insights out of a large amount of data
 Identify the “right” storytelling and user-system interaction

Conclusion
Data scientists have heterogeneous background
 The need for high-level analytic abstractions and interfaces is well-understood
 Advanced analytics work towards (semi-)autonomous data transformation
 Data management should be (semi-)automated as well
- Orchestrate data platforms, maintain data lineage, profile data
Unconventional mobility data
 Handle trajectory variety and semantic is troublesome
- Difference in sampling rates, speed, accuracy, transportation means
- We need a unifying framework for storage and analysis
 Privacy of spatio-temporal data is a concern
- Besides protection, we need scalable solutions

Publications
Journal articles
1. Matteo Francia, Patrick Marcel, Verónika Peralta, Stefano Rizzi: Enhancing Cubes with
Models to Describe Multidimensional Data. Information Systems Frontiers (2021)
2. Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A framework for
conversational OLAP. Information Systems (2021)
3. Giuliano Vitali, Matteo Francia, Matteo Golfarelli, Maurizio Canavari: Crop Management
with the IoT: An Interdisciplinary Survey. Agronomy (2021)
4. Fengmei Jin, Wen Hua, Thomas Zhou, Jiajie Xu, Matteo Francia, Maria E. Orlowska,
Xiaofang Zhou: Trajectory-Based Spatiotemporal Entity Linking. IEEE Transactions on
Knowledge and Data Engineering (2020).
5. Matteo Francia, Enrico Gallinucci, Matteo Golfarelli, Nicola Santolini: DART: De-
Anonymization of personal gazetteers through social trajectories. Journal of
Information Security and Applications. 55: 102634 (2020)
6. Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+: A framework for Augmented
Business Intelligence. Information Systems 92: 101520 (2020)
7. Matteo Francia, Matteo Golfarelli, Stefano Rizzi: Summarization and visualization of
multi-level and multi-dimensional itemsets. Information Sciences 520: 63-85 (2020)
8. Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Social BI to understand the debate
on vaccines on the Web and social media: unraveling the anti-, free, and pro-vax
communities in Italy. Social Network Analysis and Mining 9(1): 46:1-46:16 (2019)
Conference papers
1. Matteo Francia, Matteo Golfarelli, Patrick Marcel, Stefano Rizzi, Panos Vassiliadis:
Assess Queries for Interactive Analysis of Data Cubes. EDBT 2021: 121-132
2. Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Conversational OLAP in Action.
EDBT 2021: 646-649 (best demo award)
3. Antoine Chédin, Matteo Francia, Patrick Marcel, Verónika Peralta, Stefano Rizzi: The
Tell-Tale Cube. ADBIS 2020: 204-218
4. Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: Towards Conversational OLAP.
DOLAP 2020: 6-15
5. Faten El Outa, Matteo Francia, Patrick Marcel, Verónika Peralta, Panos Vassiliadis:
Supporting the Generation of Data Narratives. ER Forum/Posters/Demos 2020: 168-172
6. Faten El Outa, Matteo Francia, Patrick Marcel, Verónika Peralta, Panos Vassiliadis:
Towards a Conceptual Model for Data Narratives. ER 2020: 261-270
7. Matteo Francia, Enrico Gallinucci, Matteo Golfarelli, Stefano Rizzi: OLAP Querying of
Document Stores in the Presence of Schema Variety. SEBD 2020: 128-135
8. Matteo Francia, Matteo Golfarelli, Stefano Rizzi: Augmented Business Intelligence.
DOLAP 2019
9. Matteo Francia, Enrico Gallinucci, Federico Vitali: Map-Matching on Big Data: a
Distributed and Efficient Algorithm with a Hidden Markov Model. MIPRO 2019: 1238-1243
10. Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A Similarity Function for Multi-Level and
Multi-Dimensional Itemsets. SEBD 2018
11. Matteo Francia, Danilo Pianini, Jacob Beal, Mirko Viroli: Towards a Foundational API for
Resilient Distributed Systems Design. FAS*W@SASO/ICCAC 2017: 27-32

Thank you.
Information
World
Data
Knowledge
Wisdom
Questions?

[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data and advanced analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data and advanced analytics

Similar to [PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data and advanced analytics (20)

More from University of Bologna

More from University of Bologna (11)

Recently uploaded

Recently uploaded (20)

[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data and advanced analytics

Editor's Notes