JIST2019: The 9th Joint International Semantic Technology Conference
The premium Asian forum on Semantic Web, Knowledge Graph, Linked Data and AI on the Web. Nov. 25-27, 2019, Hangzhou, China.
Report on the First Knowledge Graph Reasoning Challenge 2018 -Toward the eXplainable AI System-
1. Report on the First Knowledge Graph
Reasoning Challenge 2018
– Toward the eXplainable AI System –
Takahiro Kawamura*1, Shusaku Egami*2, Koutarou Tamura*3,
Yasunori Hokazono*4, Takanori Ugai*5, Yusuke Koyanagi*5,
Fumihito Nishino*5, Seiji Okajima*5, Katsuhiko Murakami*5,
Kunihiko Takamatsu*6, Aoi Sugiura*7, Shun Shiramatsu*8,
Shawn Zhang*8, Kouji Kozaki*9
1.National Agriculture and Food Research Organization, Japan
2.National Institute of Maritime, Port and Aviation Technology, Japan
3. NRI digital, Ltd. 4.Nomura Research Institute, Ltd.
5. Fujitsu Laboratories Ltd. 6. Kobe Tokiwa University 7. Kobe City Nishi-Kobe Medical Center
8.Nagoya Institute of Technology 9. Osaka Electro-Communication University
Summary of Knowledge Graph Reasoning Challenge
A Contest to develop AI systems which have
abilities for “Reasoning” and “Explanation”
such like Sherlock Holmes.
Graph(OKG) AI system that estimate criminals
with reasonable explanations using
the OKG and other knowledge
The motive is …
Trick is …
The criminal is
XX Because …
3. Agenda of Talk
• Summary of Knowledge Graph Reasoning Challenge
by K. Kozaki
• Knowledge Graph Construction
by S. Egami
• Approach for estimation and reasoning techniques
by T. Ugai
• Evaluation / Conclusion and Current Work
by T. Kawamra
5. Knowledge Graph construction process
Add object types
• Discussion about schema design and methodology
• We first held five open workshops from November 2017 to April 2018.
• The total number of participants in the workshops was 110.
• After the preliminary experiment of knowledge graph construction cooperating with the participants, we
finally adopted the below process.
6. • Basic policy
– Focus on scenes in a novel and the relationship of those scenes, including the
characters, objects, places, etc., with related scenes
• A scene ID (IRI) has subjects, verbs, objects, etc.
• Edges mainly represent five Ws (When, Where, Who, What, and Why).
Architecture of the Knowledge Graph
Scene1 Scene2 Scene3
7. Scene ID
• Properties representing scenes
– subject (who)： A person or object representing the subject of the scene
– hasPredicate： A predicate representing the content of the scene
– 5W1H： what, where, when, whom, why, how.
– Relation among scenes： then, if, because, …etc.
– time： Absolute time（xsd:DateTime）
– source： Source text（EN/JP literal）
8. Original Sentence (EN|JA)
Property values are defined as
resources to be referred in the
Relationship to other Scene ID
- Situation： Fact
- Statement：Remark by A
- Talk： Remark by A to B
- Thought： Idea of A
Schema (Scene): Example
Unique ID (IRI)
of Each scene
• For evaluating estimation and reasoning techniques with explainability,
– metrics design for explainability, utility, novelty, and performance is required,
– but evaluation is also based on a qualitative comparison through discussion and
• DARPA XAI states that
– the current AI techniques have a trade-off between accuracy and explainability,
so both properties should be measured.
– In particular, to measure the effectiveness of the explainability,
DARPA XAI rates user satisfaction regarding its clarity and utility.
• We first share the basic information of the proposed approaches, and then discuss the
evaluation of experts and of the general public.
18. Sharing Basic Information for Preparation
• The basic info. Was investigated and shared with experts in advance,
who were 7 board members of the SIG on Semantic Web and Ontology in JSAI
– Correctness of the answer
Check if the resulting criminal was correct or not, regardless of the approach.
The criminal, in this case, is the one designated in the novel or story.
– Feasibility of the program
Check if the submitted program correctly worked and the results were reproduced
(excluding idea-only submissions).
– Performance of the program
Referential information on the system environment and performance of the submitted
program, except for the idea only.
– Amount of data/knowledge to be used
How much did the approach use the knowledge graph (the total num. of scene IDs used)?
If the approach used external knowledge and data, we noted information about them.
19. Expert Evaluation
• Over more than a week, the experts evaluated the following aspects
according to five grades (1–5).
• For Estimation and/or reasoning methods,
Novelty and technical improvement of the method.
Is the approach applicable to the other problems?
3 : applicable to the other novels and stories
5 : applicable to other domains.
Is the approach expected to have a further technical extension?
20. Expert Evaluation cont'd
• For Knowledge and data,
– Originality of knowledge/data construction
E.g., how much external knowledge and data were prepared?
– Originality of knowledge/data use
E.g., was a small set of knowledge used efficiently, or
was a large set of knowledge used to simplify the process.
– Feasibility of idea (for idea only)
Feasibility of idea including algorithms and data/knowledge construction.
– Logical explainability
Is an explanation logically persuadable?
1 : no explanation and evidence
3 : some evidence in any form is provided
5 : there is an explanation that is consistent with the estimation and reasoning process.
Amount of effort required for the submission (knowledge/data/system).
21. Public Evaluation
• Although the experts determine if a logical explanation could be held,
the public eval. focused on the psychological aspect of the explanations, that is,
the satisfaction with the explanation.
• The 45 applicants of SIGSWO meeting, Nov. 2018 answered to
after 15 min presentation
– Total score
• We added the total score to include psychological impressions other than explainability,
– such as presentation quality and entertainment aspects.
– If we only had a score for explainability, such aspects could be mixed in the explainability.
22. Public evaluation results
• In the public eval.,
ave., med., and sd
of the total scores and
• Comparing 1st and 2nd prizes, we found
– Ave. of both the total score and the explainability were higher for 1st prize.
– Med. of the total score of 1st was higher than 2nd prize, but
Med. of the explainability of 1st was the same as 2nd prize.
– Sd of both the total score and the explainability were also bigger for 1st prize
than for 2nd prize. (smaller the better)
• The paired t-test (α = 0.05) indicated that
– diff. in total score had a statistically significant difference bet. 1st and 2nd., but
– diff. in the explainability was not significantly different.
23. Expert evaluation results
• In the expert evaluation,
– Ave. of each metric in 1st prize
were higher than 2nd prize, but
the explainability was statistically
significantly higher for 2nd prize.
– We should note that sd of ave.
for each metric were less than 0.1
• Therefore, the final decision was left
to the expert peer review.
• As a result, the prize order was
– the metrics other than the explainability
of 1st prize were > or = 2nd
• The eval. including explainability was left
to the future challenge…
24. Conclusion and Current Work
• The 1st knowledge graph reasoning challenge in 2018 was summarized
for the development of AI techniques integrating ML and reasoning.
• The 2nd challenge started in June 2019 !
– 4 knowledge graphs from 4 mystery novels were added and published on
(Speckled Band + A Case Of Identity / Crooked Man / Dancing Men / Devils Foot)
– In addition to guessing criminal with explanation,
– it is better to be commonly applied to as many novels as possible.
– New tool/utility creation task is also added.
• The 3rd international challenge will open call for application in 2020!
• The 4th challenge will have knowledge graphs of real social problems, e.g., books listing
best practices of social problem solving, etc. 24