20181106 survey on challenges of question answering in the semantic web saltlux
1. Page 1 / 20
Survey on Challenges of
Question Answering
in the Semantic Web
Semantic Web journal 2016
Höffner et al.
Leipzig University, Institute of Computer Science, AKSW Group
홍동균 (Saltlux Inc.)
2018. 11. 16
3. Page 3 / 20
Introduction
• Semantic question answering (SQA)
– Asking questions in natural language and receiving answers from a RDF
knowledge base.
• SQA systems
– Since natural language is complex and ambiguous, reliable SQA systems
require many different components.
– Instead of a shared effort, however, many essential components are
redeveloped, which is an inefficient use of researcher’s time and resources.
4. Page 4 / 20
Introduction
• Contributions
– Surveyed existing work with 72 publications about 62 systems developed
from 2010 to 2015.
– Identified challenges faced by those approaches and collected solutions for
them from the 72 publications.
– Made recommendations on how to develop future SQA systems.
5. Page 5 / 20
Methodology
• Inclusion criteria
– Candidate 1: First 300 publications of Google Scholar search results
Query: “ ‘question answering’ AND (‘Semantic Web’ OR ‘data web’) “
– Candidate 2: All publications in the proceeding
Target conference: ISWC, ESWC, WWW, NLDB, QALD challenge
• Exclusion Criteria
– Published before November 2010 or after July 2015
– Not related to SQA
• Result
– 72 publications describing 62 distinct SQA systems.
(39 of them from candidate 1, 33 of them form candidate 2)
6. Page 6 / 20
7 Challenges
• Lexical Gap
• Ambiguity
• Multilingualism
• Complex Queries
• Distributed Knowledge
• Procedural, Temporal and Spatial Questions
• Templates
Number of publications per year
addressed challenge
7. Page 7 / 20
Lexical Gap
• The vocabulary used in a question is different from the one used in
the labels of the knowledge base. (linking problem)
– Different form of the same word
(run <-> running, ran), (running <-> runnign, runing)
– Different form of the similar meaning
Synonyms (run <-> sprint)
hyper-hyponym pair (chemical process - photosynthesis)
– Different phrases of the same RDF property
“What is the population of A”, “How many people are there in A?” -> ‘population’
8. Page 8 / 20
Lexical Gap - Different form of the same word
• String normalization
– Conversion to lower case or to base form
Stemming, Lemmatizing (running, ran -> run)
• Similarity functions
– Quantifying similarity using a function and a threshold can be applied
Jaro-Winkler distance
Edit-distance
Largest common substring
9. Page 9 / 20
Lexical Gap - Different form of the similar meaning
• Automatic Query Expansion
– Using additional labels from lexical databases such as WordNet
– Increase recall but lead to mismatches between related words and thus can
decrease the precision.
WordNet
10. Page 10 / 20
Lexical Gap - Different phrases of the same RDF property
• Pattern libraries
– BOA [Gerber et al.] generates patterns for RDF predicates from corpus and a
knowledge base
E.g. (:writing, “X wrote Y”), (:writer, “X is written by Y”), (:population, “How many
people are there in X?”)
– PARALEX [Fader et al.]
PARALEX’s examples of paraphrase from the QA dataset
(Wikianswers)
PARALEX’s examples of lexical entries
Natural Language Question:
How big is nyc?
Formal query:
Population(?, new-york)
Learning
11. Page 11 / 20
Ambiguity
• The phenomenon of the same phrase having different meanings.
– Homonymy: same string refers to different concepts
(money) bank vs. (river) bank
– Polysemy: same string refers to different but related concepts
bank (as a company) vs. bank (as a building).
“이동국” in Adam KB
12. Page 12 / 20
Ambiguity - Disambiguation
• Resource-based methods
– Ranking the candidate RDF resources based of their properties and the
connections between them
– gAnswer [Huang et al.]
Q: Who was married to an actor that played in Philadelphia?
Subgraph matching
13. Page 13 / 20
Complex Queries
• Complex Queries
– Requiring multiple facts, certain restriction, aggregation, filtered results…
E.g., Comparison, yes/no, quantifiers, superlatives
– PYTHIA [Unger et al.] constructs formal query even for complex query using
ontology-based grammar
14. Page 14 / 20
Templates
• (1) Template-based approach
– Map input questions to either manually or automatically created SPARQL
query templates
• (2) Template-free approach
– Build SPARQL queries based on the given syntactic structure of the input
question.
Template-based approach:
TBSL [Unger et al.]
Template-free approach:
Xser [Xu et al.]
15. Page 15 / 20
Others
• Multilingualism
– SQA systems that can handle multiple input languages, which may even
differ from the language used to encode the knowledge.
• Distributed Knowledge
– Some questions are only answerable with multiple knowledge bases
• Procedural Questions
– E.g. How question (step-by-step instructions)
• Temporal Question
– E.g. Temporal question on clinical narratives
• Spatial Questions
– E.g. Relationship of locations such as crossing, inclusion and nearness.
16. Page 16 / 20
7 Challenges in Adam QA
• Lexical Gap
– String normalization, similarity function, synonyms -> available
– Patterns for RDF predicates -> unavailable
Current: string matching
• Ambiguity
– Ranking the candidate RDF resources -> Available (but naïve approach)
Current: resources are ranked by the number of triples
17. Page 17 / 20
7 Challenges in Adam QA
• Complex Queries
– Comparisons, yes/no, superlatives, quantifiers -> partially available
• Templates
– Template-based approach -> available
– Template-free approach -> soon (GBQA?)
18. Page 18 / 20
7 Challenges in Adam QA
• Multilingualism
– Unavailable
• Distributed Knowledge
– Unavailable
• Procedural, Temporal and Spatial Questions
– Partially available
19. Page 19 / 20
Conclusion
• Analyzing 62 systems and their contributions to seven challenges for
SQA systems.
• Recommendation on future SQA system
– Modularization & Reusing existing parts
– Benchmarking single algorithmic modules instead of benchmarking a
system as a whole.