2. Question / Answer Type Classification
• A popular task in the field of question answering
• Question classification based on Wh-terms
• Who, What, When, Where, Which, Whom, Whose, Why, How many
• Answer type classification
• predict the type of the answer
• Existing answer type classifications (e.g., TREC QA) use coarse-grained types
• 6 types: PERSON, LOCATION, NUMERIC, ENTITY, DESCRIPTION, ABBREVIATION
• 50 subtypes: ENTITY -> animal, plant, product, sport, religion, event, food, currency
• More fine-grained classifications are possible with Semantic Web ontologies.
• DBpedia (~760 classes), Wikidata (~50K classes)
Li, Xin, and Dan Roth. "Learning question classifiers: the role of semantic information."
Natural Language Engineering 12.3 (2006): 229-249.
3. Knowledge Base Question Answering (KBQA)
• Given a natural language question, generate the SPARQL query to find the answer.
• Popular datasets for KBQA in the Semantic Web community
• Question Answering over Linked Data (QALD)
• http://qald.aksw.org/
• Largescale Complex Question Answering Dataset (LC-QuAD)
• http://lc-quad.sda.tech/
• Most KBQA systems use some kind of question / answer type predication system.
• No standard dataset to evaluate the component performance.
Which films did
Stanley Kubrick direct?
select ?film where {
?film dbo:director dbr:Stanley_Kubrick .
}
2001: A Space Odyssey
Spartacus
Fear and Desire
Paths of Glory
Lolita
….
Answers
SPARQL
4. SMART Dataset
• A dataset for answer type prediction task using DBpedia and Wikidata ontologies.
• Derived using KBQA datasets.
• Three main types of questions.
Boolean Questions
Question: Is Azerbaijan a member of
European Go Federation?
category: boolean
Question: Is Darth Vader Luke’s father?
category: boolean
Literal Questions
Question: How many people live in Poland?
Category: literal
Type: number
Question: When did Shakespeare die?
Category: literal
Type: date
Question: What is the birth name of Angela
Merkel?
Category: literal
Type: string
Resource Questions
Question: Who is the heaviest player of the Chicago Bulls?
Category: resource
Type: dbo:BasketballPlayer, dbo:Athlete, dbo:Person
Question: Give me video games published by EA?
Category: resource
Type: dbo:VideoGame, dbo:Software, dbo:Work
Question: Who wrote the song Hotel California?
Category: resource
Type: dbo:MusicalArtist, dbo:Artist, dbo:Person
Question: Where did John McCarthy got his PhD from?
Type: dbo:University, dbo:EducationalInstitution,
dbo:Organization
5. SMART Dataset - II
Dataset Questions
Training Set
Resource answers 9, 584
17, 571
Literal answers 5, 188
Boolean 2, 799
Test Set 4, 393
Total 21,964
Dataset Questions
Training Set
Resource answers 11, 683
19, 670
Literal answers 5, 188
Boolean 2, 799
Test Set 4,571
Total 24,241
6. Evaluation
• Systems can participate for either one or both datasets; each will
have separate leader board.
• Systems can be rule-based, unsupervised, supervised …
• For each test question, the systems should provide
• Category and a list of types
• Evaluation metric
• Lenient NDCG@5/10 with a Linear decay (as defined by Balog and Neumayer)
Balog, Krisztian, and Robert Neumayer. "Hierarchical target type identification for entity-oriented
queries." (ACM CIKM'12).
DCG(type_list) = 𝑖=0
𝑘 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒 𝑡𝑦𝑝𝑒_𝑝𝑟𝑒𝑑𝑖
𝑙𝑜𝑔2
( 𝑖 + 1)
Relevance(typepred) = 1 − 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 typepred
,
typegold
𝑀𝑎𝑥 𝑑𝑒𝑝𝑡ℎ
7. Timeline
Date Description
6th of May, 2020 Release of training sets.
10th of August, 2020 Release of the test sets.
17th of August, 2020 Submission of system output and system description.
31st of August, 2020 Publication of results and notification of acceptance for
presentation.
14 of September, 2020 Camera-ready submission.
2-6 of November, 2020 ISWC Challenge (virtual) at the ISWC Conference
Please visit for https://smart-task.github.io/ for more details.
8. Thank You!
• We are looking forward to your participation.
• Any issues related to dataset, please report at
• https://github.com/smart-task/smart-dataset/issues
• Please feel free to contact us with any questions/feedback:
• Nandana Mihindukulasooriya <nandana.m@ibm.com>
• Mohnish Dubey <dubey@cs.uni-bonn.de>