SlideShare a Scribd company logo
1 of 44
Agenda
• What is microbenchmark?
• Why microbenchmarks for QA?
• QaldGenDataset for microbenchmarking
• QaldGen microbenchmarking framework
11/7/2019 QaldGen 2
What is Microbnechmark?
• Specialized
• Focused
• Use-case specific
• Useful for componenet-level testing
• E.g, single relation QA benchmark
11/7/2019 QaldGen 3
Why Microbnechmarks for QA?
• Multiple QA-related factors are influecing the F scores
• Fine-grained testing of the systems
• Pinpoint more detailed limitations
• Some systems are designed for special purpose or use-case
• Avoids comparing apple with oranges
11/7/2019 QaldGen 4
Effect of Question Type
0.3
0.35
0.4
0.45
0.5
0.55
0.6Avg.Fscore When In which Who What Which Where Give me How many
Easy Difficult
5
From QALD6 results
Effect of #Triple Patterns
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Avg.FScore
#TP = 1 #TP = 2 #TP = 3
#Triple Patterns has inverse relationship with the F score, i.e., the
more the Triple patterns the harder the question to answer
6
Comparison on SPARQL query forms
SPARQL ASK queries are much difficult to
answer as compared to SPARQL SELECT
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
Avg.FScore
SELECT ASK
7
Effect of #Answer
#Answers have direct relationship with the F score, i.e.,
the more answer the easiest the question to answer
0.35
0.37
0.39
0.41
0.43
0.45
0.47
0.49
0.51
Avg.FScore
#A = 1 #A = 2 #A = 3
8
Effect of Answer Types
When answer is type of Date then it is more easy to answer
and when answer is type of string it is difficult to answer
0.35
0.37
0.39
0.41
0.43
0.45
0.47
0.49
0.51
Avg.FScore
String Boolean
Resource Date
9
Effect of Aggregates functions
If the SPARQL query corresponding to a question has
aggregate functions then it is difficult to answer
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Avg.FScore
Aggregates = true
10
Why Micobenchmarking for QA?
• QALD-6 Winner: CANALI QA System
• But, for questions starting “Give me” UTQA is winner
• NLIWOD SPARQL Query Builder has F-score
0.48 -was reported as baseline over LC-QuAD
• But, SINA on same dataset for two triples patterns in
SPARQL queries has F-Score 0.80
• But, SINA reports F-score 0.0 for SPARQL queries with
four triples
11/7/2019 QaldGen
Definition of “baseline” is subjected
to type of the questions, and specific
features of the questions
11
What is QaldGen?
• Micorbenchmarks selector framework
• For QA over Linked Data
• Personalized benchmarking
• Reusing LC-Quad and QALD
• Based on graph clustering
11/7/2019 QaldGen 12
QaldGen Architecture
11/7/2019 QaldGen 13
QaldGenQaldGenData
Questions
Selection
Vectors
Representation
Clusters
Formation
Final
Questions
Selection
QaldGen
RDF
Benchmark
Contribution 1 Contribution 2
QaldGenData
• RDF Dataset for QA over linked data
• NER related testing
• Relation linking testing
• QA systems testing
• Constructed from LC-Quad and QALD-9
• 408 questions from QALD-9
• 5000 questions from LC-Quad
• Annotated each question with 51 features
11/7/2019 QaldGen 14
QaldGenData: Annotations
11/7/2019 QaldGen 15
QaldGen
L(Q)
Answer
Entities,
Relations,
Classes
Question
Total:
Entities,
Relations,
Classes
Entity1,
Relation1,
Class1
L(Entity1,2,3),
L(Relation1,2,3),
L (Class1,2,3)
Avg. words:
Entities,
Relations,
Classes
POS
tags
IN,JJ,JJR,JJS,NN,
NNS,NNP,NNPS
,PRP,RB,RBR,RB
S,VB,VBD,VBG,
VBN,VBP,VBZ,
WDT,WP,SYM
Origin
QALD9 or
LC-QuAD typeBoolean? Number? SPARQL
Text
Tota
l TPs
Total
BGPs
SPARQL
feature
s
Ans
wers
Result
size
Proj.
Vars
Text
QaldGen: Micorobenchark Selection
11/7/2019 QaldGen 16
QaldGenQaldGenData
Questions
Selection
Vectors
Representation
Clusters
Formation
Final
Questions
Selection
QaldGen
RDF
Benchmark
QaldGen: Micorobenchark Selection
11/7/2019 QaldGen 17
QaldGenQaldGenData
Questions
Selection
Vectors
Representation
Clusters
Formation
Final
Questions
Selection
QaldGen
RDF
Benchmark
Step1: Questions Selection
• From QaldGenData
• Using a single SPARQL query
• Select desired features as projections
• Options for personalization
• Use-case specific benchmarking
• Using SPARQL constructs, e.g., Filters
11/7/2019 QaldGen 18
Step2: Vectors Representation
19
Question: " What is the time zone of SaltLake City?
Golden Answer:
SELECT DISTINCT ?uri
WHERE
{
<http://dbpedia.org/resource/Salt_Lake_City>
<http://dbpedia.org/ontology/timeZone> ?uri
}
Annotated features as projections
Question Related:
• Total Words: 9
• Total Entities: 1
• Total Relations: 1
• Total Classes: 0
• Avg. Entities Words: 1
Answer Related
• #Triple patterns: 1
• #Results: 1
• #BGPs: 1
• #Projection Vars: 1
Feature Vector
Normalized Feature Vector
11/7/2019 QaldGen
9 1 1 0 1 1 1 1 1
0.3 0.14 0.2 0 0.36 0.33 0.06 0.5 0.33
Step3: Clusters Formation
• We do clustering to get sufficient diversity in the selected benchmark
• Draw vectors on multi-dementional space and get clusters
• QaldGen supports
• Kmeans++
• FEASIBLE
• Agglomerative
• Random
• FEASIBLE-Exemplars
• DBSCAN+Kmeans++
• Others clustering techniques can easily be integrated
11/7/2019 QaldGen 20
Step3: Clusters Formation
21
Plot feature vectors in a multidimensional space
Q F1 F2
Q1 0.2 0.2
Q2 0.5 0.3
Q3 0.8 0.3
Q4 0.9 0.1
Q5 0.5 0.5
Q6 0.2 0.7
Q7 0.1 0.8
Q8 0.13 0.65
Q9 0.9 0.5
Q10 0.1 0.5
Suppose we need a benchmark of 3 questions
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
11/7/2019 QaldGen
22
Calculate average point
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
11/7/2019 QaldGen
Step3: Clusters Formation
Step3: Clusters Formation
23
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point of minimum Euclidean distance to avg. point
*Red is our first exemplar
11/7/2019 QaldGen
Step3: Clusters Formation
24
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point that is farthest to exemplars
11/7/2019 QaldGen
Step3: Clusters Formation
25
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
11/7/2019 QaldGen
Step3: Clusters Formation
26
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point that is farthest to exemplars
11/7/2019 QaldGen
Step3: Clusters Formation
27
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
28
Calculate distance from Q1 to each exemplars
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
29
Assign Q1 to the minimum distance exemplar
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
30
Repeat the process for Q2
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
31
Repeat the process for Q3
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
32
Repeat the process for Q6
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
33
Repeat the process for Q8
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
34
Repeat the process for Q9
11/7/2019 QaldGen
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step3: Clusters Formation
35
Repeat the process for Q10
11/7/2019 QaldGen
Step4: Final Questions Selection
36
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate Average across each cluster
11/7/2019 QaldGen
Step4: Final Questions Selection
37
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate distance of each point in cluster to the average
11/7/2019 QaldGen
Step4: Final Questions Selection
38
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance question as the final benchmark
query from that cluster
Purple, i.e., Q2 is the final selected question from yellow cluster
11/7/2019 QaldGen
Step4: Final Questions Selection
39
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q3 is the final selected question from green cluster
11/7/2019 QaldGen
Step4: Final Questions Selection
40
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q8 is the final selected question from brown cluster
Our benchmark queries are Q2, Q3, and Q8
11/7/2019 QaldGen
Evaluation Setup
• GERBIL and Frankenstein frameworks are used to provide common
experiment settings
• Personalised 200 question sample is generated using QaldGen for
evaluating end to end QA system
• Personalised 100 question samples are generated using QaldGen for
evaluating NED and Relation linking tools
• Precision, Recall, F-score are measured to report final performances.
11/7/2019 QaldGen 41
Results
11/7/2019 QaldGen 42
Systems Dataset Task P R F
Baseline(gAns
wer2)
QALD-9 QA system 0.29 0.33 .030
gAnswer2 QaldGen QA system 0.24 0.24 0.24
QUEPY QaldGen QA system 0.27 0.27 0.27
Baseline
(TagMe)
LC-QuAD NED 0.69 0.67 0.68
TagMe QaldGen NED 0.44 0.40 0.41
DBpedia
Spotlight
QaldGen NED 0.37 0.38 0.36
Baseline
(RNLIWOD)
LC-QuAD RL 0.25 0.22 0.23
RNLIWOD QaldGen RL 0.23 0.19 0.20
EARL QaldGen RL 0.38 0.39 0.37
QaldGen Demo (http://qaldgen.aksw.org/)
11/7/2019 QaldGen 43
Conclusions
• The definition of a “baseline” is subjected to the type of questions
considered
• Overall baseline on a dataset may not be a baseline for particular type of
questions
• Reason: A QA system or its component for modular framework may not be
implemented to target some type of questions
• Microbenchmarking helps to understand exact pitfalls of a system
• For future
• Integrate LCQuad 2.0
• Google QA dataset
11/7/2019 QaldGen 44
Thanks
• QaldGen is open source https://github.com/dice-group/qald-generator
• ISWC 2019 Demo available from http://qaldgen.aksw.org/
11/7/2019 QaldGen 45

More Related Content

Similar to QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowledge Graphs

comments for simulations part I.pdf
comments for simulations part I.pdfcomments for simulations part I.pdf
comments for simulations part I.pdfJeanMarshall8
 
comments for simulations part I.pdf
comments for simulations part I.pdfcomments for simulations part I.pdf
comments for simulations part I.pdfJeanMarshall8
 
Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01jasonhian
 
CECL Methodology - CRE Loan Pools
CECL Methodology - CRE Loan PoolsCECL Methodology - CRE Loan Pools
CECL Methodology - CRE Loan PoolsLibby Bierman
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDNebulaworks
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceSoheila Dehghanzadeh
 
Continuous Delivery
Continuous DeliveryContinuous Delivery
Continuous DeliveryMike McGarr
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance TestingGrid Dynamics
 
"Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat...
"Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat..."Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat...
"Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat...Edge AI and Vision Alliance
 
Dg3 j35 electronicfaultfinding [1]
Dg3 j35 electronicfaultfinding [1]Dg3 j35 electronicfaultfinding [1]
Dg3 j35 electronicfaultfinding [1]Rajesh Mallela
 
Reducing_Learning_Curve_in_LB_GB_Sujith
Reducing_Learning_Curve_in_LB_GB_SujithReducing_Learning_Curve_in_LB_GB_Sujith
Reducing_Learning_Curve_in_LB_GB_SujithSujith Kolath
 
open sta testing Certification
open sta testing Certificationopen sta testing Certification
open sta testing CertificationVskills
 
Certification in Data Management
Certification in Data ManagementCertification in Data Management
Certification in Data ManagementL_MahonSmith
 
Six sigma technique
Six sigma techniqueSix sigma technique
Six sigma techniqueShwetaPatil174
 
Continuous performance testing
Continuous performance testingContinuous performance testing
Continuous performance testingSQALab
 
A confused tester in agile world finalversion
A confused tester in agile world finalversionA confused tester in agile world finalversion
A confused tester in agile world finalversionAshish Kumar
 
Acceptance Test Driven Development at StarWest 2014
Acceptance Test Driven Development at StarWest 2014Acceptance Test Driven Development at StarWest 2014
Acceptance Test Driven Development at StarWest 2014jaredrrichardson
 
Os Ramirez
Os RamirezOs Ramirez
Os Ramirezoscon2007
 
John Rhodes - DevOps Automated Testing
John Rhodes - DevOps Automated TestingJohn Rhodes - DevOps Automated Testing
John Rhodes - DevOps Automated TestingJohn Zozzaro
 

Similar to QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowledge Graphs (20)

comments for simulations part I.pdf
comments for simulations part I.pdfcomments for simulations part I.pdf
comments for simulations part I.pdf
 
comments for simulations part I.pdf
comments for simulations part I.pdfcomments for simulations part I.pdf
comments for simulations part I.pdf
 
Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01
 
CECL Methodology - CRE Loan Pools
CECL Methodology - CRE Loan PoolsCECL Methodology - CRE Loan Pools
CECL Methodology - CRE Loan Pools
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CD
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenance
 
Continuous Delivery
Continuous DeliveryContinuous Delivery
Continuous Delivery
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
 
"Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat...
"Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat..."Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat...
"Bad Data, Bad Network, or: How to Create the Right Dataset for Your Applicat...
 
Dg3 j35 electronicfaultfinding [1]
Dg3 j35 electronicfaultfinding [1]Dg3 j35 electronicfaultfinding [1]
Dg3 j35 electronicfaultfinding [1]
 
Reducing_Learning_Curve_in_LB_GB_Sujith
Reducing_Learning_Curve_in_LB_GB_SujithReducing_Learning_Curve_in_LB_GB_Sujith
Reducing_Learning_Curve_in_LB_GB_Sujith
 
open sta testing Certification
open sta testing Certificationopen sta testing Certification
open sta testing Certification
 
Certification in Data Management
Certification in Data ManagementCertification in Data Management
Certification in Data Management
 
Six sigma technique
Six sigma techniqueSix sigma technique
Six sigma technique
 
Continuous performance testing
Continuous performance testingContinuous performance testing
Continuous performance testing
 
A confused tester in agile world finalversion
A confused tester in agile world finalversionA confused tester in agile world finalversion
A confused tester in agile world finalversion
 
Acceptance Test Driven Development at StarWest 2014
Acceptance Test Driven Development at StarWest 2014Acceptance Test Driven Development at StarWest 2014
Acceptance Test Driven Development at StarWest 2014
 
Os Ramirez
Os RamirezOs Ramirez
Os Ramirez
 
John Rhodes - DevOps Automated Testing
John Rhodes - DevOps Automated TestingJohn Rhodes - DevOps Automated Testing
John Rhodes - DevOps Automated Testing
 

More from Muhammad Saleem

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBenchMuhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetMuhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseMuhammad Saleem
 

More from Muhammad Saleem (17)

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Recently uploaded

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxPABOLU TEJASREE
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowledge Graphs

  • 1. Agenda • What is microbenchmark? • Why microbenchmarks for QA? • QaldGenDataset for microbenchmarking • QaldGen microbenchmarking framework 11/7/2019 QaldGen 2
  • 2. What is Microbnechmark? • Specialized • Focused • Use-case specific • Useful for componenet-level testing • E.g, single relation QA benchmark 11/7/2019 QaldGen 3
  • 3. Why Microbnechmarks for QA? • Multiple QA-related factors are influecing the F scores • Fine-grained testing of the systems • Pinpoint more detailed limitations • Some systems are designed for special purpose or use-case • Avoids comparing apple with oranges 11/7/2019 QaldGen 4
  • 4. Effect of Question Type 0.3 0.35 0.4 0.45 0.5 0.55 0.6Avg.Fscore When In which Who What Which Where Give me How many Easy Difficult 5 From QALD6 results
  • 5. Effect of #Triple Patterns 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Avg.FScore #TP = 1 #TP = 2 #TP = 3 #Triple Patterns has inverse relationship with the F score, i.e., the more the Triple patterns the harder the question to answer 6
  • 6. Comparison on SPARQL query forms SPARQL ASK queries are much difficult to answer as compared to SPARQL SELECT 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 Avg.FScore SELECT ASK 7
  • 7. Effect of #Answer #Answers have direct relationship with the F score, i.e., the more answer the easiest the question to answer 0.35 0.37 0.39 0.41 0.43 0.45 0.47 0.49 0.51 Avg.FScore #A = 1 #A = 2 #A = 3 8
  • 8. Effect of Answer Types When answer is type of Date then it is more easy to answer and when answer is type of string it is difficult to answer 0.35 0.37 0.39 0.41 0.43 0.45 0.47 0.49 0.51 Avg.FScore String Boolean Resource Date 9
  • 9. Effect of Aggregates functions If the SPARQL query corresponding to a question has aggregate functions then it is difficult to answer 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Avg.FScore Aggregates = true 10
  • 10. Why Micobenchmarking for QA? • QALD-6 Winner: CANALI QA System • But, for questions starting “Give me” UTQA is winner • NLIWOD SPARQL Query Builder has F-score 0.48 -was reported as baseline over LC-QuAD • But, SINA on same dataset for two triples patterns in SPARQL queries has F-Score 0.80 • But, SINA reports F-score 0.0 for SPARQL queries with four triples 11/7/2019 QaldGen Definition of “baseline” is subjected to type of the questions, and specific features of the questions 11
  • 11. What is QaldGen? • Micorbenchmarks selector framework • For QA over Linked Data • Personalized benchmarking • Reusing LC-Quad and QALD • Based on graph clustering 11/7/2019 QaldGen 12
  • 12. QaldGen Architecture 11/7/2019 QaldGen 13 QaldGenQaldGenData Questions Selection Vectors Representation Clusters Formation Final Questions Selection QaldGen RDF Benchmark Contribution 1 Contribution 2
  • 13. QaldGenData • RDF Dataset for QA over linked data • NER related testing • Relation linking testing • QA systems testing • Constructed from LC-Quad and QALD-9 • 408 questions from QALD-9 • 5000 questions from LC-Quad • Annotated each question with 51 features 11/7/2019 QaldGen 14
  • 14. QaldGenData: Annotations 11/7/2019 QaldGen 15 QaldGen L(Q) Answer Entities, Relations, Classes Question Total: Entities, Relations, Classes Entity1, Relation1, Class1 L(Entity1,2,3), L(Relation1,2,3), L (Class1,2,3) Avg. words: Entities, Relations, Classes POS tags IN,JJ,JJR,JJS,NN, NNS,NNP,NNPS ,PRP,RB,RBR,RB S,VB,VBD,VBG, VBN,VBP,VBZ, WDT,WP,SYM Origin QALD9 or LC-QuAD typeBoolean? Number? SPARQL Text Tota l TPs Total BGPs SPARQL feature s Ans wers Result size Proj. Vars Text
  • 15. QaldGen: Micorobenchark Selection 11/7/2019 QaldGen 16 QaldGenQaldGenData Questions Selection Vectors Representation Clusters Formation Final Questions Selection QaldGen RDF Benchmark
  • 16. QaldGen: Micorobenchark Selection 11/7/2019 QaldGen 17 QaldGenQaldGenData Questions Selection Vectors Representation Clusters Formation Final Questions Selection QaldGen RDF Benchmark
  • 17. Step1: Questions Selection • From QaldGenData • Using a single SPARQL query • Select desired features as projections • Options for personalization • Use-case specific benchmarking • Using SPARQL constructs, e.g., Filters 11/7/2019 QaldGen 18
  • 18. Step2: Vectors Representation 19 Question: " What is the time zone of SaltLake City? Golden Answer: SELECT DISTINCT ?uri WHERE { <http://dbpedia.org/resource/Salt_Lake_City> <http://dbpedia.org/ontology/timeZone> ?uri } Annotated features as projections Question Related: • Total Words: 9 • Total Entities: 1 • Total Relations: 1 • Total Classes: 0 • Avg. Entities Words: 1 Answer Related • #Triple patterns: 1 • #Results: 1 • #BGPs: 1 • #Projection Vars: 1 Feature Vector Normalized Feature Vector 11/7/2019 QaldGen 9 1 1 0 1 1 1 1 1 0.3 0.14 0.2 0 0.36 0.33 0.06 0.5 0.33
  • 19. Step3: Clusters Formation • We do clustering to get sufficient diversity in the selected benchmark • Draw vectors on multi-dementional space and get clusters • QaldGen supports • Kmeans++ • FEASIBLE • Agglomerative • Random • FEASIBLE-Exemplars • DBSCAN+Kmeans++ • Others clustering techniques can easily be integrated 11/7/2019 QaldGen 20
  • 20. Step3: Clusters Formation 21 Plot feature vectors in a multidimensional space Q F1 F2 Q1 0.2 0.2 Q2 0.5 0.3 Q3 0.8 0.3 Q4 0.9 0.1 Q5 0.5 0.5 Q6 0.2 0.7 Q7 0.1 0.8 Q8 0.13 0.65 Q9 0.9 0.5 Q10 0.1 0.5 Suppose we need a benchmark of 3 questions Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 11/7/2019 QaldGen
  • 21. 22 Calculate average point Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 11/7/2019 QaldGen Step3: Clusters Formation
  • 22. Step3: Clusters Formation 23 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select point of minimum Euclidean distance to avg. point *Red is our first exemplar 11/7/2019 QaldGen
  • 23. Step3: Clusters Formation 24 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select point that is farthest to exemplars 11/7/2019 QaldGen
  • 24. Step3: Clusters Formation 25 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 11/7/2019 QaldGen
  • 25. Step3: Clusters Formation 26 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select point that is farthest to exemplars 11/7/2019 QaldGen
  • 26. Step3: Clusters Formation 27 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 11/7/2019 QaldGen
  • 27. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 28 Calculate distance from Q1 to each exemplars 11/7/2019 QaldGen
  • 28. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 29 Assign Q1 to the minimum distance exemplar 11/7/2019 QaldGen
  • 29. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 30 Repeat the process for Q2 11/7/2019 QaldGen
  • 30. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 31 Repeat the process for Q3 11/7/2019 QaldGen
  • 31. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 32 Repeat the process for Q6 11/7/2019 QaldGen
  • 32. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 33 Repeat the process for Q8 11/7/2019 QaldGen
  • 33. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 34 Repeat the process for Q9 11/7/2019 QaldGen
  • 34. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Step3: Clusters Formation 35 Repeat the process for Q10 11/7/2019 QaldGen
  • 35. Step4: Final Questions Selection 36 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Calculate Average across each cluster 11/7/2019 QaldGen
  • 36. Step4: Final Questions Selection 37 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Calculate distance of each point in cluster to the average 11/7/2019 QaldGen
  • 37. Step4: Final Questions Selection 38 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance question as the final benchmark query from that cluster Purple, i.e., Q2 is the final selected question from yellow cluster 11/7/2019 QaldGen
  • 38. Step4: Final Questions Selection 39 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance query as the final benchmark query from that cluster Purple, i.e., Q3 is the final selected question from green cluster 11/7/2019 QaldGen
  • 39. Step4: Final Questions Selection 40 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance query as the final benchmark query from that cluster Purple, i.e., Q8 is the final selected question from brown cluster Our benchmark queries are Q2, Q3, and Q8 11/7/2019 QaldGen
  • 40. Evaluation Setup • GERBIL and Frankenstein frameworks are used to provide common experiment settings • Personalised 200 question sample is generated using QaldGen for evaluating end to end QA system • Personalised 100 question samples are generated using QaldGen for evaluating NED and Relation linking tools • Precision, Recall, F-score are measured to report final performances. 11/7/2019 QaldGen 41
  • 41. Results 11/7/2019 QaldGen 42 Systems Dataset Task P R F Baseline(gAns wer2) QALD-9 QA system 0.29 0.33 .030 gAnswer2 QaldGen QA system 0.24 0.24 0.24 QUEPY QaldGen QA system 0.27 0.27 0.27 Baseline (TagMe) LC-QuAD NED 0.69 0.67 0.68 TagMe QaldGen NED 0.44 0.40 0.41 DBpedia Spotlight QaldGen NED 0.37 0.38 0.36 Baseline (RNLIWOD) LC-QuAD RL 0.25 0.22 0.23 RNLIWOD QaldGen RL 0.23 0.19 0.20 EARL QaldGen RL 0.38 0.39 0.37
  • 43. Conclusions • The definition of a “baseline” is subjected to the type of questions considered • Overall baseline on a dataset may not be a baseline for particular type of questions • Reason: A QA system or its component for modular framework may not be implemented to target some type of questions • Microbenchmarking helps to understand exact pitfalls of a system • For future • Integrate LCQuad 2.0 • Google QA dataset 11/7/2019 QaldGen 44
  • 44. Thanks • QaldGen is open source https://github.com/dice-group/qald-generator • ISWC 2019 Demo available from http://qaldgen.aksw.org/ 11/7/2019 QaldGen 45