Professor of computer science at King's College London
Sep. 15, 2023•0 likes•84 views
1 of 47
Knowledge graph use cases in natural language generation
Sep. 15, 2023•0 likes•84 views
Download to read offline
Report
Technology
Keynote talk at INLG (International Natural Language Generation Conference) & SIGDial (Special Interest Group on Discourse and Dialogue), September 2023
4. ORGANISING THE WORLD’S
INFORMATION
Find the right thing Get the best summary Go deeper and broader
[Source: https://blog.google/products/search/introducing-knowledge-graph-things-not/]
7. NOT YOUR USUAL GOFAI
• Orders of magnitude higher scales than
GOFAI knowledge bases
• Simple knowledge representation, no
formal semantics
• Vocabulary reuse, networks of small
modular vocabularies
• Incomplete, inconsistent, always changing
• Built via human-AI pipelines (w/ ETL,
information extraction etc.)
• Many large open-source projects with
strong communities
• Knowledge graph services for developers
[Source: Noy et al., 2019]
8. NOT YOUR USUAL
LLM EITHER
Content-wise some overlap, but different
paradigm: knowledge engineering,
decentralized data publishing i.e., identifiers,
reusable schemas, interlinking
More related to semantic networks, frames,
rule-based NLP than LLM
Knowledge graphs are a common source of
embeddings in AI systems
14. NEURAL NETWORK
TRAINED ON
WIKIDATA/WIKIPEDIA
Feed-forward architecture encodes
triples from the ArticlePlaceholder
into vector of fixed dimensionality
RNN-based decoder generates text
summaries, one token at a time
Optimisations for different entity
verbalisations, rare entities etc.
15. RESEARCH
QUESTIONS
RQ1 Can we train a neural
network to generate text from
triples in a multilingual setting?
RQ2 How do editors perceive the
generated text on the
ArticlePlaceholder page?
RQ3 How do editors use the
generated sentence in their
work?
Data Methods Participants
RQ1 Metrics
and survey
answers
Metrics-based
evaluation (BLEU 2,
BLEU 3, BLEU 4,
METEOR and
ROUGEL), scores
(perceived fluency,
appropriateness)
Readers of Arabic
and Esperanto
Wikipedia’s
RQ2 Interviews Task-based
evaluation,
thematic analysis
Arabic, Persian,
Indonesian,
Hebrew,
Swedish Wikipedia
editors
RQ3 Interviews
and text reuse
metrics
Task-based
evaluation,
thematic analysis,
metrics-based
evaluation
Arabic, Persian,
Indonesian,
Hebrew,
Swedish Wikipedia
editors
16. RQ1 METRICS-BASED EVALUATION
Trained on corpus of Wikipedia sentences
and corresponding Wikidata triples (205k
Arabic; 102k Esperanto)
Tested against three baselines: machine
translation (MT) and template retrieval
(TR, TRext), 5-gram Kneser-Ney language
model (KN)
Using standard metrics: BLEU, METEOR,
ROUGEL
18. RQ1: PERCEIVED
FLUENCY AND
APPROPRIATENESS OF
TEXT
54 participants, frequent readers of
Arabic (n=27) and Esperanto (n=27)
Wikipedia
60 summaries (30 automatically
generated, 15 news, 15 Wikipedia
sentences from the training corpus)
Esperanto news from Le Monde
Diplomatique, Arabic news from BBC
Arabic
Arabic: ~400 annotations (each),
Esperanto: ~230 annotations (each)
19. RQ1: PERCEIVED FLUENCY AND
APPROPRIATENESS OF TEXT
Participants could tell the
difference between news and
Wikipedia content.
AI can produce Wikipedia-like
text
Higher standard deviation in
automatically generated text
than genuine content
20. RQ2/3: TASK-
BASED
EVALUATION
10 experienced editors from 6 Wikipedia’s
(average tenure ~9 years), semi-structured
interviews, have edited English Wikipedia
Arabic sentences produced by the neural
network, synthetic sentences for the other
languages
Removed up to 2 words per sentence,
emulating the behaviour of the neural network
(concept in native language, related entity not
connected in Wikidata)
Editors were asked to write 2-3 sentences
21. RQ2: TASK-BASED EVALUATION
Under-resourced Wikipedia editing: Useful summaries, particularly for
non-native speakers
Provenance, transparency: Readers assumed generated text was from a
Wikipedia in another language rather than AI generated
Length of text: One sentence article signalled the article needs work,
longer snippets should match reading practice
Importance of text: People looked at the text first rather than the triples.
Text added context to triples, seeing the text reassured people that they
landed on the right page
22. RQ3 TASK-BASED
AND METRICS-
BASED EVALUATION
The snippets were heavily used
All participants reused them at least
partially.
8 of them were wholly derived and
the other 2 were partially derived
from the automatically generated
text
<rare> tokens lead participants to
discard the whole sentence
Hallucinations remained undetected
even by participants with domain
knowledge
Disjoint longest sequences of tokens in the
edited text that exist in the source text.
Informed by vandalism detection metrics in
Wikipedia
Wholly Derived (WD): gstscore >= 0.66
Partially Derived (PD): 0.66 > gstscore >= 0.33
Non-Derived (ND): 0.33 > gstscore
25. PROVENANCE
MATTERS
Wikidata is widely used
beyond WikiProjects
It is a secondary source of
knowledge
References should be
accessible, relevant,
authoritative
28. CLAIM VERBALISATION
Train a model to convert Wikidata triples into
natural language phrases
• Contextualises predicates
• Formats entity labels as they would appear in sources
Wikidata provides entity labels and multiple
aliases
• Multiple verbalisations can be supported
• Preferred predicate aliases can be set according to entity types
Measure quality of verbalisation
• Crowdsourcing rather than algorithmic, e.g., BLUE, METEOR,
ROUGE, etc
• Fluency (0-5 scale) and adequacy (Yes or No)
29. (Chandler Fashion Center, directions, Highway 101 & Highway 202)
«■ Not logged in Talk Contributions Create account Login
voyage
Page Discussion Read Edit View history | Search Wikivoyage
This is an old revision of this page, as edited by SelfieCity (talk | contribs) at 20:58,13 October 2019 (page banner). (diff)
•— Older revision | Latest revision (diff) | Newer revision —» (diff)
Main page
Travel destinations
Star articles What's
Nearby?
Tourist office
Random page
North America > United States of America >
Chandler
Southwest (United States of America) > Arizona > Greater Phoenix > Chandler (Arizona)
Travellers' pub
Recent changes
Community portal
Maintenance panel
Policies
Help
Interlingual lounge
Donate
Chandler is a city in Arizona, and a medium-sized suburb of Phoenix with over 240,000 residents. It is a delightful place to visit.
Get in
Related changes
Upload file
By plane
Phoenix Sky Harbor International Airport (PHXIATA) +1 602 275-4958 [1]i? is the main air gateway to Arizona. It is in East Phoenix 3 miles from downtown. All major U.S. carriers serve Phoenix
Special ।
Perm am
Page inft
Cite this
F rirt s r
* Q Chandler Fashion Center^ (Chandler Mali), 3111 W Chandler Blvd (Highway 101 & Highway 202), W +1 480-812-8488. M-Sa 10AM-9PM, Su 11AM-6PM; restaurant and dept store hours vary An upscale shopping mall with various
department stores, (updated Sep 2018)
• H me Shoppes at Casa Palomas’, 7131 West Ray Road (Highway 10 & Ray). Upscale outdoor shopping and dining center, (updated Sep 201 B>
•BPhoenix Premium Outletss’, 4976 Premium Outlet Way (Highway 10 & Highway 202), +1-480-639-1766. A shopping destination for locals and visitors looking for upscale shopping in a casual, family-friendly atmosphere.
(updated Sep 2018)
In other projects Get around
Wikimedia Commons
Wikipedia
Car and bus are the easiest ways to get around Chandler Rideshare services Uber and Lytt also operate in the city. Google's Waymo autonomous car service is also available in parts of the city.
37. CROWDSOURCING RESULTS
Fluency: resembles text written by humans
Adequacy: text keeps meaning of triples
590 workers
• Most did 1 task; some did up to 168 tasks
Inter-annotator reliability (Krippendorff’s Alpha)
• Fluency: 0.427 (Moderate)
• Adequacy: 0.458 (Moderate)
38. Mean Fluency Median Fluency Adequacy
Fluency scores
0. Incomprehensible text
1. Barely understandable text with significant grammatical errors
2. Understandable text with moderate grammatical errors
3. Comprehensible text with minor grammatical errors
4. Comprehensible and grammatically correct text that still reads artificial
5. Comprehensible and grammatically correct text that feels natural
39. BAD
FLUENCY
Information syntactically hard to understand
1-[(3S,9S,10S)-12-[(2R)-1-hydroxypr... → stereoisomer of → 1-
[(3R,9R,10S)-12-[(2R)-1-hydroxypr…
1-[(3R,9R,10S)-12-[(2R)-1-hydroxypr... is
Redundant information
Bydgoszcz → flag → flag of Bydgoszcz
The flag of Bydgoszcz is the flag of Bydgoszcz
Loosely defined predicates
(15976) 1998 FY119 → time of discovery or invention →
20/03/1998 | 1.6
(15976) 1998 FY119 was invented on 20/03/1998
Predicates reliant on qualifiers
Pseudochaete → different from → Pseudochaete
Pseudochaete is different from Pseudochaete
40. BAD
ADEQUACY
Information syntactically hard to
understand
(182176) 2000 SM250 → time of discovery or invention
→ 24/09/2000
The invention of the SM250 was made on 24/09/2000
and was discovered on 182176.
Redundant information
Gru → discography → Gru discography
The discography of Gru is extensive.
Predicate labels too broad
barrel wine → facet of → barrel
The facet of barrel wine is the same.
Predicates lacking specificity
Decius → child → Hostilian
Decius is a child of Hostilian.
43. SUMMARY
Wikidata is the data backbone of
Wikipedia
Verbalising it can help bootstrap
articles in under-resourced languages
More empirical research needed to
understand level of assistance
needed when writing with AI and the
level of transparency required etc
44. SUMMARY
Knowledge graphs are curated, trusted
sources of knowledge, which can augment
LLMs to reduce hallucinations, facilitate
answer attribution, support ethical alignment
Their knowledge integrity must be
guaranteed
Natural language generation can help verify
knowledge claims against diverse sources
45. WHAT’S
NEXT
Conversational generative AI
as a tool to create, curate,
access knowledge graphs’
content… responsibly?
Thanks to: Gabriel Amaral, Jonathan Hare, Lucie
Kaffee, Odinaldo Rodrigues, Pavlos Vougiouklis
46. Kaffee, L. A., Vougiouklis, P., & Simperl, E. (2022). Using
natural language generation to bootstrap missing
Wikipedia articles: A human-centric perspective. Semantic
Web, 13(2), 163-194.
47. Amaral, G., Rodrigues, O., & Simperl, E. (2022,
October). WDV: A Broad Data Verbalisation Dataset
Built from Wikidata. In International Semantic Web
Conference (pp. 556-574). Cham: Springer
International Publishing.
Amaral, G., Rodrigues, O., & Simperl, E. ProVe: A
Pipeline for Automated Provenance Verification of
Knowledge Graphs Against Textual Sources. To appear
in Semantic Web,