Pests of mustard_Identification_Management_Dr.UPR.pdf
Predicting query performance and explaining results to assist Linked Data consumption.
1. Predicting Query Performance and
Explaining Results to Assist Linked Data
Consumption
Candidate: Rakebul Hasan
Jury:
President: Johan Montagnat, CNRS (I3S)
Director: Fabien Gandon, INRIA
Co-director: Pierre-Antoine Champin, LIRIS, UCBL, Lyon
Reviewers:
Pascal Molli, University of Nantes
Philippe Cudré-Mauroux, University of Fribourg, Switzerland
2. 2
Accessing Linked Data
Dereferencing URIs: default
SPARQL Endpoints
68% data sets, 2011
98% data (triples), 2014
Consuming Linked Data
Query federation
On-the-fly dereferencing
Crawling
Integrating disparate data to support intelligent applications
Attribution: “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
3. 3
Results
Workload management tasks:
configuration, organization, inspection, and optimization
“How long will the query take to execute?”
History of data lifecycle:
make trust judgments, validate or invalidate results
“Why this result?”
4. 4
Linked Data Access/HTTP
“Why this result?”
“Show me the flow of information in the result derivation”
“Show me the summary of what happened in the result derivation process”
5. 5
Linked Data Access
Assistance in Querying
Query Performance
Assistance in Result Understanding
Query Results
Results Produced by Applications
6. Query Performance Prediction
Explaining SPARQL Query Results
Linked Explanations: Explanations for
Linked Data Applications
Summarizing Explanations
Outline
7. 7
Predictions
Statistics about data?
-Published statistics
-Few datasets
How to predict query performance without using data statistics? -Not detailed
8. 8
Previously executed queries
Query Q1 took 100 ms
Query Q2 took 120 ms
Query Q3 took 200 ms
Query Q4 took 190 ms
...
Query Qm took 300 ms
Predictions
“How long will the query take to execute?”
9. Unseen query q
9
y is the performance metric
Query Q1 took 100 ms
Query Q2 took 120 ms
Query Q3 took 200 ms
Query Q4 took 190 ms
...
Query Qm took 300 ms
Learn
Regression
f(q) = ?
Predictions
How to model SPARQL query characteristics for machine learning algorithms?
Feature extraction
11. 11
Graph pattern features:
Landmarks in training queries
Similarities between landmark queries
and the query in examination
Inverting
approximated graph edit distance
Clustering:
k-medoids with
Approximated graph edit distance
[Riesen et al 2009]
12. 12
Queries:
1260 training, 420 validation and 420 test queries generated from DBPSB
benchmark [Morsey et al. 2011] query templates
RDF dataset:
DBpedia
Learning models:
k-NN regression with k-D tree
SVM with nu-SVR for regression
Triple store:
Jena TDB 1.0.0
16 GB allocated memory
Commodity hardware:
Intel Xeon 2.53 GHz
48 GB RAM
Linux 2.6.32
Experiments
15. Model Training Time Avg. Prediction Time per Query
k-NN + algebra 7.14 sec 3.42 ms
SVM + algebra 26.26 sec 3.53 ms
k-NN + algebra + graph pattern 3300.33 sec
(55.01 min)
47.25 ms
SVM + algebra + graph pattern 3390.71 sec
(56.5 min)
98.1 ms
15
16. Query Performance Prediction
Explaining SPARQL Query Results
Linked Explanations: Explanations for
Linked Data Applications
Summarizing Explanations
Outline
17. 17
Results
History of data lifecycle:
Modifying
make trust judgments, validate or invalidate results
“Why this result?”
Provenance-based query result explanation
18. • Explanations
– Provenance models ( e.g. PML, W3C PROV-O)
– Presentation/UI
– Justifications
• Provenance for query results
– Relational databases
• Why, where, how provenance
– Annotation approach
– Non-annotation approach
– RDF and SPARQL
• Transform RDF and SPARQL to relational models [Theoharis et al. 2011,
Damásio et al. 2012]
• Annotation approaches: Corese/KGRAM [Corby 2012], TripleProv [Wylot 2014]
18
Previous Work
19. Query Result Provenance
Triple
:person1 rdf:type foaf:Person. t1
:person1 foaf:based_near "Paris". t2
:person2 rdf:type foaf:Person. t3
:person2 foaf:based_near "Paris". t4
:person3 rdf:type foaf:Person. t5
:person3 foaf:based_near "Paris". t6
:person4 rdf:type foaf:Person. t7
:person4 foaf:based_near "London". t8
19
select ?location
where
{
?person rdf:type foaf:Person.
?person foaf:based_near ?location.
}
location
London
Paris How? Why?
Provenance for the result tuple (location=“Paris”):
How-provenance: (t1 t2) (t3 t4) (t5 t6)
Why-provenance: {{t1, t2}, {t3, t4}, {t5, t6}}
Lineage: {t1, t2, t3, t4, t5, t6}
Geerts et al. Algebraic structures for capturing the provenance of SPARQL queries. ICDT '13
Green et al. Tannen. Provenance semirings. SIGMOD/PODS, 2007.
Buneman, Khanna, Tan, “Why and Where: A Characterization of Data Provenance”, ICDT’01
20. Non-Annotation-based Algorithm
to Compute Why-Provenance
Triple
:person1 rdf:type foaf:Person. t1
:person1 foaf:based_near "Paris". t2
:person2 rdf:type foaf:Person. t3
:person2 foaf:based_near "Paris". t4
:person3 rdf:type foaf:Person. t5
:person3 foaf:based_near "Paris". t6
:person4 rdf:type foaf:Person. t7
:person4 foaf:based_near "London". t8
20
select ?location
where
{
?person rdf:type foaf:Person.
?person foaf:based_near ?location.
}
location
London
Paris Why?
Bind the values from the result tuple to
the original query and project all
SELECT *
{
?person rdf:type foaf:Person.
?person foaf:based_near ?location.
VALUES ?location {"Paris"}
}
variables
person location
:person1 Paris
:person2 Paris
:person3 Paris
results
:person1 rdf:type foaf:Person.
:person1 foaf:based_near “Paris”.
Provenance for the result tuple (location=“Paris”):
Why-provenance: {{t1, t2}, {t3, t4}, {t5, t6}}
21. 21
Query execution time
Number of result tuples
Provenance generation time for all result tuples
Provenance generation overhead for all result tuples
Provenance generation time
per result tuple
22. Source selection: SPARQL ASK [Schwarte et al. 2011]
Nested loop join: evaluate iteratively
Exclusive grouping and bound join [Schwarte et al. 2011]
Virtual integration model (graph) [Gaignard 2013]
22
FedQ-LD: Federated Query
Processor
Data
Source
Data
Source
Data
Source
Sub-query
Results of sub-query
Sub-query
Results of sub-query
Sub-query
Results of sub-query
Query
Results
Explanation
Facility
Explain tuple
Why-
Provenance
-based
Explanation
UI Basic query federation features
Explanation-Aware Federated Query Processor Prototype
25. • Explanations for the Semantic Web
– Assumptions:
• improve users’ understanding
• improved understanding leads to improved trust
• Evaluating Explanations
– Recommender systems [Tintarev et al. 2012]
– Context-aware applications [Lim et al. 2009]
25
Not evaluated yet
26. • H1. Query result explanations improve user experience over having no
explanations
– User experience: understanding and trust
• User study to test our hypothesis
– Scenario: explanation-aware federated query processing
– Participants: with explanation and without explanation
– Learning: how the system works, example query with or without
explanation
– Reasoning: solve a federated query
– Survey: feeling about the system
• Setup
– Date sources/Data sets: DBpedia and LinkedMDB
– Query: British movies with American actors
– Participants: 11 total, 6 w and 5 w/o; 8 m and 3 f; age:22-66; RDF and
SPARQL
26
27. 90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
With Explanation Without Explanation
Fully Correct Partially Correct Incorrect
Data source selection Source triple selection
27
Response about data source selection and source triple selection
Participants with explanation understood the system better
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
With Explanation Without Explanation
Fully Correct Partiallly Correct Incorrect
28. 28
Confidence level of the participants about their answers
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
With Explanation Without Explanation
Very High High Medium Low Very Low
Participants with explanation were more confident on their answers
29. 60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
Understanding Making trust judgments
29
How users feel about the system
helpful (“yes”) or unhelpful (“no”)
Participants felt that explanations are helpful
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
With Explanation Without Explanation
Yes No
0.00%
With Explanation Without Explanation
Yes No
30. Query Performance Prediction
Explaining SPARQL Query Results
Linked Explanations: Explanations for
Linked Data Applications
Summarizing Explanations
Outline
32. Publish explanation metadata as Linked Data
Named graphs for reification and bundling metadata
Dereferenceable named graph URIs
- Statements inside the named graph
- Related statements
32
Linked Explanations
33. Previous approaches
Linked Data incompatibility: blank nodes
No support for data interchanging standards (W3C PROV)
Ratio4TA ontology http://ns.inria.fr/ratio4ta
An extension of W3C PROV-O
33
Representing Explanation Metadata
39. 39
Entry point to the full explanation
Salient, abstract, and coherent information
Provide a means to filter information in large explanations
Filtering: a set of classes used in the reasoning
Inspired by text summarization [Eduard 2005] and
ontology summarization [Zhang et al 2007]
40. 40
Ranking Measures
Salient RDF Statements
Degree centrality of subject and object
Abstract Statements Abstract
41. 41
Ranking Measures
Similarity of RDF Statements
Similarity between the filtering classes and the statement
subject, predicate, object
[Corby et al 2006]
42. 42
Re-Ranking Measures
Subtree Weight in Proof Tree
Salience of a statement w.r.t. its position in the proof tree
considering the weights of all the statements in the current branch
1
1
1
1
1
3
3
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.57
0.6
0.6
0.6
0.4
0.6
0.47
Coherence
Coherent
Iteratively selecting the statement with best potential
contribution to the total coherence of the summary
Not coherent
43. • A query
– Scientists born in United Kingdom
• Query result with explanation
– Bob because
– Bob is a Computer Scientist
– Computer Scientists are Scientists
– Bob was born in London
– London is part of England
– England is part of United Kingdom
• Rating the necessity of each explanation statements from scale of 1
to 5
43
Evaluation
Inferences: RDFS type propagation,
owl:sameAs, transitivity of
gn:parentFeature
For “with filtering”: query + filtering classes (e.g. Computer Scientist,
Place) + result + explanation
44. 44
Gender
Female
Male
Knowledge of RDF
Yes
No
journalism
psychology
computer science
business administration
biology
chemist
mathematician
social scientist
10
8
6
4
2
0
20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59
People
Age
Analysis of ground truths (Cosine similarity)
Avg. Std. dev.
Without Filtering 0.836 0.048
With Filtering 0.835 0.065
48. Summary of Contributions
Query Performance Prediction
Explaining SPARQL Query Results
Linked Explanations
Summarizing Explanations
Non-annotation approach
to why-provenance
Evaluating the impact of
explanations
48
49. Perspectives
49
Query Performance Prediction
Query optimization
Training query generation
Explaining performance
Explaining SPARQL Query Results
How-provenance
More participants
Re-evaluating
Linked Explanations
Named graphs
Large amount of
metadata
Summarizing Explanations
Effectively using the
rankings for presentation
Personalized explanations
classifying users
based on their usage logs
50. • Rakebul Hasan and Fabien Gandon. A Machine Learning Approach to SPARQL
Query Performance Prediction. WI 2014
• Rakebul Hasan. Generating and Summarizing Explanations for Linked Data. ESWC
2014
• Rakebul Hasan. Predicting SPARQL Query Performance and Explaining Linked
Data. PhD Symposium, ESWC 2014
• Rakebul Hasan and Fabien Gandon. Predicting SPARQL Query Performance.
Poster, ESWC 2014
• Rakebul Hasan, Kemele M. Endris and Fabien Gandon. SPARQL Query Result
Explanation for Linked Data. SWCS 2014, ISWC 2014
• Rakebul Hasan and Fabien Gandon. A Brief Review of Explanation in the Semantic
Web. ExaCt 2012, ECAI 2012
• Rakebul Hasan and Fabien Gandon. Linking Justifications in the Collaborative
Semantic Web Applications. SWCS 2012, WWW 2012
50
Thank You
Editor's Notes
Good afternoon everyone,
Welcome to my phd thesis defense.
I will talk about predicting query performance and explaining results to assist linked data consumption.
In the recent years, we have seen a sharp growth of publishing linked data, thanks to the w3c linking open data initiative.
DON’T SAY: Click
There are two basic ways to access this data: first, dereferencing URIs, and then a large portion of this data is also available via SPARQL endpoints.
DON’T SAY: Click
There are various approaches for integrating this desperate data to support a new generation of intelligent applications.
In this context of integrating linked data, users such as knowledge engineers can have resource intensive workloads. They may need support for workload management tasks, for example: inspecting, organizing, configuring. They may want to ask questions like “how long will the query take to execute?”
DON’T SAY: Click
When they get the results back, they may want to know about the history of data used in the result derivation, to make more informed trust judgments, to validate or invalidate results. They may want to ask questions like “why this result?”
Additionally, in the context of applications which consume this linked data and produce their results, users can ask why an application produced a result. They may want to know the flow of the information in the result derivation.
DON’T SAY: Click
Furthermore, they may also want to have a summarized overview of what happened in the whole derivation process.
In this thesis, our focus is twofold:
first, assisting users in querying by providing predicted query performance to have an understanding of query behaviors prior to executing the queries
second, assisting users in result understanding. Two types of results: query results, and results produced by applications that consume linked data.
In the rest of the presentation, I am going to talk about the major contributions of my thesis. First, I’ll talk about query performance prediction. Then, I’ll talk about explaining SPARQL query results. Next, I’ll talk about linked explanations. And finally I’ll talk about summarizing explanations
Traditional approaches for query cost estimation are based on data statistics. First some statistics about the underlying data is generated or extracted. Then, based on those statistics, prediction models predict the cost of the queries.
DON’T SAY: Click
In the context of linked data, statistics about the data is often missing.
DON’T SAY: Click
There are some approaches to publish statistics about the data. But there are very few datasets that follow these approaches. In addition, those statistics are very basic, not detailed enough for prediction models.
DON’T SAY: Click
So the challenge is how to predict query performance without using data statistics.
------------------------
Only 32.20 % (95 out of 295) data sources provide a voiD description.
Again the scenario for us is that the users will ask how long will the query take to execute.
DON’T SAY: Click
Our approach is to learn query performance metrics from query logs of already executed queries. We apply machine learning on those query logs to predict query performance metrics.
DON’T SAY: Click
We do this on the querying side without using any data statistics, which makes our approach suitable for the linked data scenario.
The first step in our approach is to represent queries as vectors so that they can be used by machine learning algorithms. Here x is a vector representing a query and x will have various features. y here is the performance metric.
DON’T SAY: Click
Then we learn a mapping function f(x)=y using regression.
DON’T SAY: Click
And finally when we have an unseen query, we compute the value of this function f to compute our prediction.
DON’T SAY: Click
So the challenge here for us is how to model SPARQL query characteristics for machine learning algorithms, namely feature extraction.
Firs we extract SPARQL algebra features for a query. We transform a sparql query to sparql algebraic expression tree. Then from the operators of this tree, we construct a vector. The frequencies and cardinalities of these operators become values for different dimensions of the vector.
Next we have the graph pattern features, which is a relative representation of the query pattern in a query, relative to the training query.
First we find some landmark queries by clustering the training queries. Cluster centers are the landmark queries. Then we compute similarity values between the landmark queries and the query in examination.
DON’T SAY: Click
To cluster the training queries, we use k-medoids, which allows us to use an arbitrary distance function. The distance function for us is the approximated graph edit distance. A query pattern is represented by a graph. To compute the distance between two such graphs, we use approximated graph edit distance. We use an approximated solution because the exact solution for graph edit distance has exponential time complexity. The solution we use has a cubic time complexity.
DON’T SAY: Click
To compute the similarity between two graphs, we just invert their approximated graph edit distance.
------------------------------
Graph pattern features
Landmarks in training queries: clustering using k-medoids with approximate edit distance.
Similarity values between landmark queries and the query in examination form the graph pattern feature vector. Similarly is computed by inverting approximate edit distance.
We did some experiments with this representation. We generated our training, validation, and test queries from the DBPSB benchmark query templates. DBPSB templates cover most commonly used SPARQL query features in the queries sent to DBPedia. As learning models, we used k-nn regression with kd-tree and SVM with nu-svr kernel. We used commodity hardware with jena triplestore. We use query execution time as the performance metric.
-------------------------
DBPSB templates cover most commonly used SPARQL query features in the queries sent to DBPedia
DBpedia as RDF data set
Predicting query execution time
k-NN regression with k-D tree
SVM with nu-SVR for regression 4 core
Intel Xeon 2.53 GHz CPU, 48 GB system RAM, and Linux 2.6.32 operating system.
First the predictions using only the algebra features. The figure in the top left shows the comparison between predicted and actual values using k-nn. On the x axis we have the predicted execution times. On the y axis we have the actual query execution times. On the right side, we have DBPSB templates on the x axis, and root mean squared errors on the y axis. The R-squared value for k-nn is .96645. An R-squared value close to one means the model predicts well. As you can see, some queries have large errors in this model.
Next, we used the algebra features with support vector machine. Our prediction accuracy improved a bit. Some errors went down. The R-squared value went up.
When we use both the algebraic and graph pattern features, the accuracy of k-nn worsens a bit. But SVM gives us the most accurate predictions among all the experiments. R-squared value of 0.985 and most of the predictions are close to the perfect prediction line. The errors are low for most queries.
Time required for prediction and training, when we use both types of features then the training time is very high because we need to compute the distance matrix for the training queries. However this is an offline process. So it does not effect the amount of time required to predict.
For prediction time, again when we use both types of features, the avg. time required to predict is higher, but they are reasonable, less than 100 ms. The reason for higher time is that we have to compute the approximated graph edit distance for the query in examination.
So this was everything about query performance prediction.
Next, I am going to talk about explaining query results.
When the users get the results back, , they may want to know about the history of data used in the result derivation.
We provide this kind of information by means of explanations. We provide provenance based query result explanations.
DON’T SAY: Click
Challenge here is, how to generate provenance for SPARQL query results on SPARQL endpoints without modifying the query language, the underlying data model, or the query processing engine? Previous approaches for computing SPARQL query result provenance are annotation based approaches which require modifying the query language, the underlying data model, or the query processing engine, to keep tack of what happened during the query processing and using this meta information to generate provenance.
DON’T SAY: Click
But in the context of linked data, it’s not possible to do this modifications.
Previous works on explanations in the semantic web literature focus on provenance models, presentation of explanations, and justification based explanations.
For provenance related works, relational databases filed has a rich literature. Major types of provenance include why, where, how provenance. In addition, there are two approaches to compute provenance: annotation and non-annotation. Annotation approaches keep tack of the provenance related metadata during the query processing. Non-annotation approaches compute the provenance only when it’s needed, by means of querying the data again.
In the RDF and SPARQL literature, there are few approaches for computing query result provenance. Some approaches are based on transforming RDF and SPARQL to relational models, and applying the provenance computation approaches of relational databases.
There are few approaches for native SPARQL query processing, but these approaches are annotation based approaches.
We propose a non-annotation based approach to compute query result provenance.
---------------
Annotation/eager approach
Extra annotations added during the query processing
Keeping traces of the source data for results
Little bit about query result provenance, what we mean by query result provenance. Imagine we have this data – these triples, this query, and these results. Let’s say we want to know the provenance of the result tuple Paris – why this result tuple was derived, or how this result tuple was derived. Why-provenance will give you the different derivations of the result tuple with explicit information for each derivation – each of the inner-set is a derivation path. How-provenance will give you the derivations along with the performed operations. The theoretical foundations of these notions are described in pervious works on provenance for SPARQL and relational database queries.
--------------------------
Why-provenance: all different derivations of the result tuple with explicit information for each derivation.
Intuitively, for a result tuple t for query Q on RDF graph G, lineage is the set of triples G’ which contribute to the result t
We adopt the notion of why-provenance to provide query result explanations. We propose a non-annotation based algorithm, as it’s suitable for the linked data scenario. I will show you a simplified simulation of our proposed algorithm. Let’s say for the same example as before, we want to compute why-provenance for the result tuple Paris.
DON’T SAY: Click
In the first step, we bind the result tuple to the original query, and then we project all the variables in the new re-written query.
DON’T SAY: Click
The results of the re-written query intuitively give us all the relevant variable bindings for the result tuple in examination.
DON’T SAY: Click
Then we replace the variables in the triple patterns in the original query, by the corresponding values from the result tuples of the re-written query to generate provenance. Each tuple in the result tuples of the re-written query represents a derivation path. To get all the derivation paths, we iterate through all the result tuples of the re-written query.
In this example, this two triples are t1 and t2, which is the first derivation in the why-provenance.
So this is the general idea. But when there are complex operators like UNION, things get a bit more complex.
At the moment, we support
SELECT Queries without subqueries.
We do not support: FILTER (NOT) EXISTS, MINUS, property paths, aggregates.
-----------------------
A witness: replace the variables in the BGPs by the corresponding values in a result tuple of the rewritten query results. Check the existence of the resulted triples using SPARQL ASK. Why-Provenance: Do this for all the result tuples in the results of the rewritten query for why provenance and record them. The result for this example would be: {{t1, t2}, {t3, t4}, {t5, t6}}
We did a performance evaluation of our approach with the DBPSB benchmark queries and the same setup as our query performance prediction experiments.
The first column is the query template number, then #RES is the number of result tuples, then QET is the query execution time, PGT is the provenance generation time for all result tuples, then PGO is the provenance generation overhead for all result tuples, finally PGTPR is the provenance generation time per result tuple.
So when we generate provenance for all the query result tuples, our algorithm is very costly in terms of time, the worst overhead in our case is 61,587%. But this is understandable because we have to solve a query for each of the result tuples to compute provenance. So non-annotation based algorithms are not good for generating provenance for all the query result tuples. For us the interesting measure here is the PGTPR, provenance generation time per result tuple. In our explanation scenario, we will only generate why-provenance for a result tuple when we need the provenance. As you can see here, all the PGTPR is really low. And that is why our approach is suitable for the explanation scenario.
We implemented our approach in a federated query processor prototype. We built a basic federated query processor with common query federation features like source selection, nested loop join, exclusive grouping and bound join, and virtual integration model.
DON’T SAY: Click
Then on top of our federated query processor, we implemented our explanation facility as a plug-in. When you get the results, you can ask explanations for each of the tuples in the result. The system will respond with a why-provenance based explanation user interface.
----------------------------
Why-provenance-based explanations in the context of querying and data integration over Linked Data
Query federation is a prominent approach to consume, process, and integrate Linked Data
To give you an example, here we have the query user interface. The query here is solved from DBpedia and LinkedMDB. You have some results here, and when you click on the explain button for a result tuple, the explanation user interface will appear.
DON’T SAY: Click
The user interfaces shows a derivation from the why-provenance of the selected result tuple.
The oval shapes here are representing data sets, for example: DBpedia, LinkedMDB. The first rectangle contains triple patterns that matched against it’s corresponding data set, and the second rectangle contains provenance triples in the corresponding data set.
So that was everything about how we generate and present query result explanations.
Now I am going to talk about the impact of query result explanations on users. It’s good that we have explanations, but we also need to understand whether the explanations are useful.
In the previous works in the explanations for the semantic web literature, the assumptions were that explanations improve users’ understanding, and the improved understanding leads improved trust.
DON’T SAY: Click
But these assumptions where not evaluated, whether these assumptions are true or not.
DON’T SAY: Click
In the other other areas, for example recommender systems, and context-aware applications, researchers have proposed methodologies for evaluating explanations. Our work is based on the methodologies proposed by Lim and others for evaluating explanations.
------------
What are the impacts of query result explanations?
These assumptions are however are not evaluated
Based on the previous works on explanations for the semantic web, we hypothesize that “query result explanations improve user experience over having no explanations”. We define user experience as users understanding of the system and their perception of trust on the results.
We develop a user study to test our hypothesis. The scenario is explanation-aware federated query processing. There were two groups of participants: with explanation and without explanation. There are three sections in the study.
First, in the learning section, we gave a high level description of the system and how it works. We also gave them an example query and a result tuple for the query. The participants in the with explanation group additionally received an explanation for the result tuple they received.
The next section is the reasoning section where we ask the participants to manually solve a federated federated query. The goal of the reasoning section is to examine whether the participants can apply their knowledge of how the system works learned in the learning section. So that we can examine whether there was an impact of providing explanations in the learning section.
Finally, in the survey section, we ask the participants how they feel about the system.
We used DBpedia and LinkedMDB as datasets. We used a very simple query “british movies with American actors”. 11 participants took part in the study. 6 were provided explanations and 5 were without explanations. There were 8 males and 3 females. All the participants had knowledge of RDF and SPARQL. The age of the participants ranges from 22-66.
------------
users’ understanding of the system and their perception of trust on the results
Two groups of participants: with explanations and without explanations (randomly selected)
Learning section
Provided with a high level description of the system
An example of federated query and a result with or without explanation
Reasoning section
Participant were given a federated query solving task
Given a query and a result tuple
Select data sources
Select triples which contribute to the result
Confidence level on their answer choices
Survey section
How they feel about the system
Understanding
Making trust judgments
The response about data source/data set selection and source triple (i.e. provenance triple) selection for the task of solving the given federated query with a result tuple.
In the data set selection part, we can not really come to a conclusion whether there is an impact of explanations. Because the answers for both with explanation and without explanation groups are very similar.
For the source triple selection part, most participants with explanations answered correctly, but many participants without explanations answered incorrectly.
So we can say that the participants with explanation understood the system better. Because most participant with explanation correctly selected the data sources and the source triples.
The confidence level of participants about their answers,
When the participants were given explanations, their confidence level about the answers where very high or high. When they were not given explanations, they were saying that their confidence level is only high. So here we see that the participants with explanations were more confident with their answers.
Finally, how the users feel about the system, whether they feel explanations were helpful or unhelpful.
Irrespective of whether we provided the participants explanations or not, majority of them said explanations were helpful to understand the system.
Also, irrespective of whether we provided them explanations or not, majority of them said explanations were helpful for making trust judgments.
The participants felt that the explanations helped them to better understand the system and to make better trust judgments on the results.
Overall, these results validate our hypothesis
So that was everything about explaining query results.
Now I am going to talk about Linked Explanations: explanations for linked data applications.
The scenario here is that applications consume and produce linked data. Applications also consume linked data which are these produced linked data.
DON’T SAY: Click
In this context, the scenario really is explaining distributed reasoning. The existing approach for explaining distributed reasoning is centralized. There is a centralized metadata registry. In oppose to that our approach is to decentralize explanations for distributed reasoning.
We propose linked explanations for this. We provide proof-tree based explanations.
So what do we mean by linked explanations?
We publish the explanation metadata as linked data. The main caveat here is to using dereferenceable named graphs for reification and bundling metadata.
When we dereference a named graph URI, we return the statements inside the named graph, and we also return the related statements that are the RDF statements where the named graph URI is a subject or an object.
Linked explanations enable explanation for distributed data. To generate proof-tree base explanations, we can follow the links and retrieve explanation metadata for source data recursively
----------------------------
Enables explanation for distributed data
Follow the links and retrieve explanation metadata for source data recursively
To publish explanation metadata as linked data, first we need a vocabulary to describe them.
Previous approaches for representing explanation metadata have some incompatibility with Linked data with regards to blank nodes . Blank nodes are usually avoided in Linked data because they are not globally referenceable. Also previous approaches do not use data interchanging standards such as w3c prov-o.
We propose Ratio4TA which by extending W3C PROV-O.
Extending PROV-O promotes interoperability by enabling data consumers to process explanation metadata according to the W3C PROV standards.
Ratio4TA allows to describe data derivations, dependency between input data and output data, the reasoning process, results, rules, software applications. Then we can bundle these metadata using an explanationbundle named graph
To give you an example, applications can produce and publish the data and metadata as Linked Data as shown here.
DON’T SAY: Click, DON’T SAY: Click
Then consumers of the data and metadata can follow the links and retrieve explanation metadata for source data recursively.
In this way we can generate proof-tree-based explanations for data which was derived from distributed source data.
You can have some nice user interfaces. Here we generate natural language based proof trees by using the rdf labels
This was everything about linked explanations.
Now I am going to talk about summarizing these explanations.
This is an example of a proof tree based explanation, with minimal information, for a derivation using data from dbpedia and geonames.
DON’T SAY: Click
However, as you can see here, explanations can get large and can be overwhelming.
How can we summarize this kind of explanations, which can provide an entry point to the full explanation. We also want to provide a feature of filtering information in explanations.
Our approach is inspired by text summarization and ontology summarization. We define some measures for summarizing explanations.
We rank the RDF statements in an explanations by three measures to provide summarized explanations.
First, salience measure, the salience of an RDF statement indicates the importance of the RDF statement.
We take the weighted average of the normalized degree centrality of the subject and the object of an RDF statement in the proof tree.
DON’T SAY: Click
The next measure is abstract statements.
We consider a statement that is close to the root statement in the corresponding proof tree is more abstract than a statement that is far from the root.
We compute the abstractness of a statement by inverting the level of the statement to which it belongs.
Third measure is the similarity measure.
The consumers of our explanations can specify a set of classes as their filtering criteria.
We rank the more similar statements to the classes given in the filtering criteria higher.
We compute the similarity between the set of filtering classes and a statement by taking the similarity scores between the classes of subject, predicate, object and the filtering criteria classes.
We use the approximated query solving feature of Corese SPARQL engine to compute similarity between two classes.
The approximated query solving feature is a semantic distance-based similarity feature to compute conceptual similarity between two classes in a schema.
------
extra
We did not use the centrality of the predicate of statement while computing salience because the
centrality values of predicates in an RDF graph often do not change as they are
directly used from the schemata. In contrast, every new RDF statement changes
the centrality values of its subject and object.
We use two more measures to improve the rankings produced by the combinations of three measures we presented so far.
First, Subtree Weight in Proof Tree.
This measure helps us to measure the salience of a statement w.r.t. its position in the proof tree considering the weights of all the statements in the current branch.
For example the picture here shows the subtree weight computation by considering only salience as a combination to compute ranking score.
DONOT SAY: click animation subtree
First, the number of statements in the subtrees
DONOT SAY: click animation Ssl
The weight of the satements computed using is the salience measure, but it can be computed by combinations of the measures I present before.
DONOT SAY: click animation scorest
the subtree weight in proof tree measure by by taking the average weights of all the statements in that subtree.
Finally, the coherence measure for re-ranking.
The idea is to provide more coherent information in the summary.
We re-rank the explanation statements by iteratively selecting the statement with best potential contribution to the total coherence of the summary.
We consider an RDF statement x to be coherent to an RDF statement y if x is directly derived from y. For example this two statements here are coherent and these two statements are not coherent
---------------------
Previous works in text summarization [10] and ontology summarization [27] show that coherent information are desirable in summaries.
For re-ranking a ranked list of statements, we repeatedly select the next RDF statement, n times where n is the number of statements.
Here RL is the ranked list of RDF statements;
S is the list of already selected RDF statements in the summary;
i is the next RDF statement to be selected in S.
We re-rank RL by repeatedly selecting next i
As you can see, as the next statement, we select the best statement considering salience and potential contribution to the total coherence of the summary
Again, the score (j ) for a statement j here can be computed by combinations of the measures i presented before.
The reward score of a statement j is the amount of potential contribution value, ranging from 0.0 to 1.0, to the total coherence of the summary if j is added to S .
The function coherent (S ) returns the number of coherent statements in the summary S .
To evaluate our summarization approach, we again did a user study.
We gave a participant a query with a result and an explanation for the result. In the reasoning for the query result, we have RDFS type propagation and owl:sameAs inferences, and also inferences with respect to the transitivity of gn:parentFeature.
We used data from dbpedia and geonames.
We asked the users to rank the statements in the explanation.
To evaluate the filtering feature, we provide a participant a randomly selected class along with a query, a result, and the explanation for the result. And we ask them to rank the statements in the explanations.
Out of the 24 survey participants with different backgrounds
18 participants in our survey had knowledge of RDF and 6 participants did not have any knowledge of RDF.
The ages of the participants range from 22 to 59.
20 participants were male and 4 were female.
The table in the bottom shows the total average agreement between rating vectors measured by cosine similarity and standard deviations for
two scenarios – without filtering criteria and with filtering criteria.
The average agreements for both the scenarios are more than 0.8 which is considerably high. However, the standard deviation is higher for the scenario with filtering criteria. The reason for this higher standard deviation is that the participants had to consider the highly subjective factor of similarity and therefore they had chances to disagree.
--
their ratings had more variance for the scenario with FL.
We use normalized discounted cumulative gain to evaluate ranking quality.
An nDCGp value of 1.0 means that the ranking is perfect at position p with respect to the ideal ranking.
In our study, the average of ratings by all the survey participants for a statement is the grade for that statement, which gives us the ideal ranking .
The figures show the average nDCG values of the three test cases for different rankings by different measure combinations.
The x-axis represents ranks and the y-axis represents nDCG .
For the scenario without filtering criteria (the figure on the left), three of the measure combinations produce very similar rankings to the ground truth rankings.
For the scenario with filtering criteria (the figure on the right), the same three measure combinations with added similarity measure have the best nDCG values.
This means that the participants consider central abstract, and coherent information as necessary information in explanation summaries for the scenario without filtering criteria .
This also holds for the scenario with filtering criteria with the added observation that the participants also consider similar information as necessary information.
The nDCG values for these measure combinations are higher than 0.9 for all ranks. This means that the rankings by these measure combinations are highly similar to the ground truth rankings.
In contrast, the sentence graph summarization ranking has low nDCG values compared to all the other rankings for the scenario without filtering criteria .
This shows that our explanation summarization algorithms produce much higher quality rankings than sentence graph summarization algorithm.
-----------
Drop if no time: Discounted Cumulative Gain measures the quality of results of an Information Retrieval (IR) system in a ranked list
Drop if no time: It uses the assigned ratings/grades to measure the usefulness, or gain, of a ranked list of results. It penalizes high quality results appearing lower and
Drop if no time: Normalized discounted cumulative Gain (nDCG ) allows to calculate and compare discounted cumulative gain across multiple lists of results where each of the lists might have different length. nDCG values are in the interval 0.0 to 1.0.
We evaluate the summarized explanations produced by different measure combinations by comparing them to human generated summarized explanations (i.e. ground truth summarized explanations) using F-score .
To generate the ground truth summarized explanation for an explanation, we include a statement in the ground truth summarized explanation
if its rating is greater than or equal to the average rating of all the statements in the original explanation.
The Figures show the average F-scores for different measure combinations for summaries with different sizes for the three test cases.
The x-axis represents compression ratio CR . The y-axis represents F-scores .
For the scenario without filtering criteria (the figure on the left), the best F-score is 0.72 when CR value is 0.33 by the measure combinations salience + abstract+ subtree, and salience + abstract + subtree + coherent.
This is a desirable situation with a high F-score and low CR .
The sentence graph summarization performs poorly with a best F-score value of 0.34 in the CR interval 0.05 to 0.3.
For the scenario with FL (the figure on the right), the best F-score is 0.66 at CR values 0.53 and.
However, the F-score 0.6 at CR value 0.3 by the measure combination salience +abstractness +similarity +coherence is more desirable because the size of the summary is smaller.
As expected, our summarization approach performs worse in the scenario with filtering criteria where we use the similarity measure . This is due to the fact that the survey participants had to consider the highly subjective factor of similarity.
Recall reflects how many good statements the algorithm have not missed
Precision reflects how many of the algorithm’s selected statements are good
F-score is a composite measure of Recall and Precision
Gold standard summary: if a statements rating is greater than or equal to the average rating of all the statements in the original explanation
So that was everything about summarizing explanations.
Now the conclusions.
Summary of the contributions.
I first spoke about query performance prediction, the goal was to provide predicted query performance related information prior to executing the queries to assist users in workload management related tasks.
Then I spoke about explaining query results. The goal was to assist users in understanding sparql query results. I presented a non-annotation approach to generate why-provenance for sparql quey results. I also presented a user study to evaluate the impact of query result explanations.
Next I spoke about linked explanations, which basically allow you to explain distributed reasoning in the context of linked data.
Finally I spoke about summarizing those explanations.
Perspectives,
For query performance prediction, we would like to see how we can use our approach in query optimization, specially in scenarios where we query the linked data, for example federated query processing over linked data.
There is also how we can generate training data. The idea is to mine query logs to extract to find some dominant features of the queries, and then using those features to synthetically generate training query, which have a good coverage of the possible queries.
Finally explaining performance, at the moment we can say that a query may take this amount of time or that amount of time, but we can’t say why take this amount of time or that amount of time. So it will be interesting to explore how we can explain that aspect. This will include explaining machine learning algorithms.
The next is explaining SPARQL query results, we would like to extend our algorithm for how provenance. We would like to do our study with more participants. It was hard to find people who are motivated to solve the tasks. Maybe it would be interesting to identify people in croudsourcing platforms and reward them for solving the tasks. We would also like to re-design the study to go back to participants and ask them why they have answered something, and why not something else.
Next linked explanations, at the moment it’s not clear what is the community consensus with respect to dereferencing named graph URIs. It would be interesting to see how this develops, specially now that RDF 1.1 adopted the notion of named graph.
Then, we have a large amount of metadata when we describe explanations using our vocabulary. So we would need scalable RDF data storage and processing techniques.
Finally, summarizing explanations, we would to use our rankings for effective presentation of explanations. For example expanding or not expanding a branch of a proof-tree while presenting it.
We would also like to provide personalized explanations. For example, we can classify users based on their usage logs and provide personalized explanations to target different types of users.
With that I will finish my presentation.
Thank you for your attention.