20181106 survey on challenges of question answering in the semantic web saltlux

/ 20
Survey on Challenges of
Question Answering
in the Semantic Web
Semantic Web journal 2016
Höffner et al.
Leipzig University, Institute of Computer Science, AKSW Group
홍동균 (Saltlux Inc.)
2018. 11. 16

/ 20
Contents
1. Introduction
2. Methodology (to find SQA systems)
3. 7 Challenges
4. 7 Challenges in Adam QA
5. Conclusion

/ 20
Introduction
• Semantic question answering (SQA)
– Asking questions in natural language and receiving answers from a RDF
knowledge base.
• SQA systems
– Since natural language is complex and ambiguous, reliable SQA systems
require many different components.
– Instead of a shared effort, however, many essential components are
redeveloped, which is an inefficient use of researcher’s time and resources.

/ 20
Introduction
• Contributions
– Surveyed existing work with 72 publications about 62 systems developed
from 2010 to 2015.
– Identified challenges faced by those approaches and collected solutions for
them from the 72 publications.
– Made recommendations on how to develop future SQA systems.

/ 20
Methodology
• Inclusion criteria
– Candidate 1: First 300 publications of Google Scholar search results
 Query: “ ‘question answering’ AND (‘Semantic Web’ OR ‘data web’) “
– Candidate 2: All publications in the proceeding
 Target conference: ISWC, ESWC, WWW, NLDB, QALD challenge
• Exclusion Criteria
– Published before November 2010 or after July 2015
– Not related to SQA
• Result
– 72 publications describing 62 distinct SQA systems.
 (39 of them from candidate 1, 33 of them form candidate 2)

/ 20
7 Challenges
• Lexical Gap
• Ambiguity
• Multilingualism
• Complex Queries
• Distributed Knowledge
• Procedural, Temporal and Spatial Questions
• Templates
Number of publications per year
addressed challenge

/ 20
Lexical Gap
• The vocabulary used in a question is different from the one used in
the labels of the knowledge base. (linking problem)
– Different form of the same word
 (run <-> running, ran), (running <-> runnign, runing)
– Different form of the similar meaning
 Synonyms (run <-> sprint)
 hyper-hyponym pair (chemical process - photosynthesis)
– Different phrases of the same RDF property
 “What is the population of A”, “How many people are there in A?” -> ‘population’

/ 20
Lexical Gap - Different form of the same word
• String normalization
– Conversion to lower case or to base form
 Stemming, Lemmatizing (running, ran -> run)
• Similarity functions
– Quantifying similarity using a function and a threshold can be applied
 Jaro-Winkler distance
 Edit-distance
 Largest common substring

/ 20
Lexical Gap - Different form of the similar meaning
• Automatic Query Expansion
– Using additional labels from lexical databases such as WordNet
– Increase recall but lead to mismatches between related words and thus can
decrease the precision.
WordNet

/ 20
Lexical Gap - Different phrases of the same RDF property
• Pattern libraries
– BOA [Gerber et al.] generates patterns for RDF predicates from corpus and a
knowledge base
 E.g. (:writing, “X wrote Y”), (:writer, “X is written by Y”), (:population, “How many
people are there in X?”)
– PARALEX [Fader et al.]
PARALEX’s examples of paraphrase from the QA dataset
(Wikianswers)
PARALEX’s examples of lexical entries
Natural Language Question:
How big is nyc?
Formal query:
Population(?, new-york)
Learning

/ 20
Ambiguity
• The phenomenon of the same phrase having different meanings.
– Homonymy: same string refers to different concepts
 (money) bank vs. (river) bank
– Polysemy: same string refers to different but related concepts
 bank (as a company) vs. bank (as a building).
“이동국” in Adam KB

/ 20
Ambiguity - Disambiguation
• Resource-based methods
– Ranking the candidate RDF resources based of their properties and the
connections between them
– gAnswer [Huang et al.]
Q: Who was married to an actor that played in Philadelphia?
Subgraph matching

/ 20
Complex Queries
• Complex Queries
– Requiring multiple facts, certain restriction, aggregation, filtered results…
 E.g., Comparison, yes/no, quantifiers, superlatives
– PYTHIA [Unger et al.] constructs formal query even for complex query using
ontology-based grammar

/ 20
Templates
• (1) Template-based approach
– Map input questions to either manually or automatically created SPARQL
query templates
• (2) Template-free approach
– Build SPARQL queries based on the given syntactic structure of the input
question.
Template-based approach:
TBSL [Unger et al.]
Template-free approach:
Xser [Xu et al.]

/ 20
Others
• Multilingualism
– SQA systems that can handle multiple input languages, which may even
differ from the language used to encode the knowledge.
– Some questions are only answerable with multiple knowledge bases
• Procedural Questions
– E.g. How question (step-by-step instructions)
• Temporal Question
– E.g. Temporal question on clinical narratives
• Spatial Questions
– E.g. Relationship of locations such as crossing, inclusion and nearness.

/ 20
7 Challenges in Adam QA
• Lexical Gap
– String normalization, similarity function, synonyms -> available
– Patterns for RDF predicates -> unavailable
 Current: string matching
• Ambiguity
– Ranking the candidate RDF resources -> Available (but naïve approach)
 Current: resources are ranked by the number of triples

/ 20
• Complex Queries
– Comparisons, yes/no, superlatives, quantifiers -> partially available
• Templates
– Template-based approach -> available
– Template-free approach -> soon (GBQA?)

/ 20
• Multilingualism
– Unavailable
– Unavailable
• Procedural, Temporal and Spatial Questions
– Partially available

/ 20
Conclusion
• Analyzing 62 systems and their contributions to seven challenges for
SQA systems.
• Recommendation on future SQA system
– Modularization & Reusing existing parts
– Benchmarking single algorithmic modules instead of benchmarking a
system as a whole.

20181106 survey on challenges of question answering in the semantic web saltlux

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 20181106 survey on challenges of question answering in the semantic web saltlux

Similar to 20181106 survey on challenges of question answering in the semantic web saltlux (20)

More from DongGyun Hong

More from DongGyun Hong (7)

Recently uploaded

Recently uploaded (20)

20181106 survey on challenges of question answering in the semantic web saltlux