ISWC2023-McGuinnessTWC16x9FinalShort.pdf

Semantic Web Research in the Age of
Generative Artificial Intelligence:
Semantic Web Renaissance?
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professorof Computer, Cognitive, and Web Sciences and
Industrial and Systems Engineering
RensselaerPolytechnic Institute, Troy, NY, USA
November 9, 2023
International Semantic Web Conference
Athens, Greece

DLM Background: Ontologies, Knowledge Graphs,
Provenance, Knowledge Portals, Hybrid AI Solutions…)
Formal Foundations with application focus; often
hybrid, highly interdisciplinary, methodology focused

A Few Questions to Ponder
and 4 Areas of Interest
• Are we entering a Renaissance period for the semantic web? “an incredible
time of beauty, blossoming with creativity and curiosity”?
• How might the “Hype Cycle” perspective inform our work?
1. How might we leverage the evolving generative AI tools and services?
What kind of new collaborative processes may be useful?
2. How might we create/use “sandboxes” for exploration and evaluation?
3. What hybrid architectures should we explore?
Are Knowledge Graphs a new (provenance-aware) notebook?
4. How is emerging legislation poised to impact our work?
Is semantics for explanation, provenance, bias detection, evaluation,
reporting, analysis, .. something we should explore?
2

AI from a
Gartner
Perspective
Note KG place
Note GenAI place

1. Hybrid Processes
4
How might we leverage the evolving generative AI tools and
services?
What kind of collaborative processes may be useful?

Process Exploration
Collaborative KG Generation
What if we use
McGuinness’ and
Aasman’s explorations on
building a Wines Ontology
and the associated
Knowledge graph, without
and with LLMs as a
thought experiment
5

From Living with CLASSIC (1989) and
Ontologies Come of Age (2003)
McGuinness’ Base ontology is very simple – initially written in 1985 as an expert
system. Converted to a KL-ONE ontology in 1989
– Wine, Winery, Grape, Flavor, Body, Color, Sugar
– Stood the test of time: Living with Classic (1991), CLASSIC tutorials,
Ontologies 101(2001), OWL Guide (2004), … numerous translations, >8k
citations
– To scale however, needs to be compatible with WIDE range of menus, wine
lists, vocabularies. Not hard to obtain but significant enhancement required.
– Needs more ecosystemsupport – explanation, provenance, validation,
inconsistency detection, prioritization scheme, UI considerations, additional
social connections, citizen-oriented maintenance and evolution schemes,
scale, partitioning, …
www.ksl.stanford.edu/people/dlm/papers/living-with-classic-abstract.html

What if LLMs are used (cont)?
9

LLMs and refinement
10
• It appears that it is essentially
my ontology (from Living with
CLASSIC (1991 book from 89
workshop on Foundational
Aspects of Semantic
Networks) and the expanded
version of the Stanford Tech
report
• Jans wanted more information
about flavors

Population with LLMs
11
• Find important wine grapes

Population with LLMs continued
12
• Find grapes for each type of wine – example Bordeaux

Population with LLMs continued
13
• Find
particular
wines

Human-LLM Collaboration
14
• Leverage ontologies as starting points
• Human driven refinement (example adding flavors)
• Good use of LLMs for population – both for value
restrictions (grape type per region) and instances (example
prominent Bordeaux wines)
• It continued to populate values (for flavors)
• (Some) validation (using e.g., webchatGPT)
• Overall nice use for GUIDED completion – more testing
needed

Lessons? Discussion
15
• LLMs can greatly enhance aspects of KG design/population
• **Having the (skilled) human collaborate with the LLM is likely
critical
• We may discuss other ways to leverage LLMs – e.g., we might
consider pain points in ontology design and ontology-enabled
application build out (e.g. Andrea Nuzzolese’s keynote in Deep
learning for KG workshop with competency questions)
• Designing and building collaborative processes I think is in our
future.
• Next “sandbox” for exploration

2. “Sandbox” for
Exploration/Evaluation
16
How might we create/use “sandboxes”
for exploration and evaluation?

ChatBS: A Context-aware LLM Exploratory Sandbox
ChatBS version: 2023-10-04 15:52:00 (GPT-4-0613)
Submit questions
to OpenAI
Completion API:
Custom "System"
and "User" prompts
Up to 10 prompt
re-submissions
Up to 10 unique,
independent
responses to user's
prompt
Downloadable JSON
file with complete
promptsand results
https://inciteprojects.idea.rpi.edu/chatbs/app/chatbs/
ChatBS, the Context-aware LLM Exploratory Sandbox uses the OpenAI CompletionAPI
service (GPT-4-0613 model) to answer questions. Each sentence in a ChatBS result is
automatically linked to a Google query to facilate fact-checking. ChatBS can then use the
OpenAI API to construct an entity/relation graph of these results in the form['entity1',
'relationship', 'entity2']. ChatBSthen uses entity linking to look up entities and relationships
against Wikidata, constructing a JSON-LD graph as it proceeds.
RDF generation is
optional, to facilitate
question answering
Each sentence in
results is automatically
linked to a Google
query, proving a quick
"sanity-check"
Option to append
user promptsfor
"chain-of-thought"
reasoning
evaluation
Truncated table of triples,
relevant to OpenAI's
answers (assertions)
RDF generated by LLM-
based ChatBS-REST entity
linking service
Highlighting when Subject
or Object entitiesare bnodes
(ie when entitiesnamedby
ChatGPT are not found in
Wikidata)
Downloadable JSON-LD
of complete resultsgraph
Embedded
Wikidata entity
summary (using
AutoDesc API) of
selected entity…to
aid in human fact
checking!
Table of RDF triples,
relevant to OpenAI's
answers (assertions)
RDF generated by
LLM-based ChatBS-
REST entity linking
service

NOTE: ChatBS may have been updated sincethese screenshots werecaptured…
Each of these responses is
essentially correct, yet each is
distinct.
Thisdemonstrateswhy it is
important to submitquestions
multiple timeswhen studying
GTP-4-generated
"explanations."

Simple question,widely varying
(and all imprecise) answers!!
Correct answer: 1998 - 2007

Deborah'shonors and awards are
listed on her Wikipedia pageand
personal web site(s), but GPT-4's
training appearsnot to have caught
those… so it tap-dances!

Graph comparison
Generate
RDF
Potential Architecture for a
KG-powered Fact Checker
Example: A Wikidata-based
GPT fact checker
User prompt
(question)
OpenAI
* https://platform.openai.com/docs/api-reference/completions
continuation
API*
Continuation
Response
Submitted prompt
Relationship discovery and
entity linking service
GPT-asserted
relationships
(RDF triples)
Selected GPT model:
e.g. GPT-4-0613
Build labeled
entity graph
Link Entities
and Properties
labels
URIs
Supporting
semantics
(context)
Who is Deborah McGuinness?
Deborah McGuinness is an American
computer scientist…
Wikidata GPT
(reference) (test)

Assessment of commonsense proficiency in
conversational systems
● Context-aware evaluation
● Different problem solving modalities
● Presentation formats
● Research questions:
○ How does the above affect machine
problem solving?
○ What is the benefit of the human-in-
the-loop? Can we adapt it to human-
in-the-loop instructable systems?
Drive multi-hop machine reasoning,
provide feedback on improvements to the
dataset [2]
[1] Kejriwal, Santos, Mulvehill, McGuinness. 2022.
Designing a strong test for measuring true common-
sense reasoning. Nature Machine Intelligence.
[2] Kejriwal, Santos, Mulvehill, McGuinness. 2023.
Context-Rich Evaluationof Machine Common
Sense. Artificial General Intelligence (AGI-23).
24
Partially DARPA MCS

Lessons? Discussion
25
• Our Context-aware LLM exploratory sandbox is just one
example of an architecture
• Evaluation “Harnesses” may be an area ripe for
exploration.
• More sandboxes and evaluation harnesses may benefit
our field

What are unsolved problems in formal
ontology research? (by Chatgpt)
•
• Ontology alignment: Integrating multiple ontologies or aligning different parts of a single
ontology is a challenging task. Developing techniques to automatically align ontologies,
resolve semantic heterogeneity, and handle inconsistencies is …needed
• Ontology evolution: Ontologies are dynamic and need to evolve over time to reflect changes in
the domain they represent. Managing ontology evolution, including identifying and propagating
changes, maintaining consistency, and handling versioning and compatibility… is needed
• Ontology modularization: Large ontologies can become complex and difficult to manage.
Techniques for decomposing ontologies into smaller, modular components while preserving
important relationships and ensuring coherence are still an active area of research.
• Ontology evaluation: Evaluating the quality and effectiveness of ontologies is essential.
Developing objective metrics and evaluation frameworks to assess ontology quality, usability,
completeness, and correctness is an ongoing challenge.
26

What are unsolved problems in formal
ontology research? (by Chatgpt)
• Ontology learning and population: Automatically constructing ontologies from unstructured or
semi-structured data is …needed Techniques for ontology learning, including concept extraction,
relationship identification, and population of ontology instances … are active research areas.
• Reasoning scalability: Efficient reasoning is crucial for utilizing ontologies effectively. Developing
scalable reasoning algorithms that can handle large-scale ontologies and complex reasoning
tasks while maintaining acceptable performance is a significant challenge.
• Uncertainty and vagueness in ontologies: Ontologies often deal with imperfect or uncertain
information. Capturing and representing uncertainty and vagueness in ontologies and
developing reasoning techniques that can handle such knowledge …is needed.
• Ontology visualization and user interfaces: Presenting ontologies in a human-readable and
intuitive manner is essential for effective knowledge representation and user interaction.
Developing visualization techniques and user interfaces that facilitate ontology exploration,
editing, and understanding is an area of active research.
27

Brainstorming: Lets Leverage LLMs
• Ontology alignment
• Ontology evolution
• Ontology modularization
• Ontology evaluation
• Ontology learning and population:
• Reasoning scalability
• Uncertainty and vagueness in ontologies
• Ontology visualization and user interfaces
• Remember Gerhard’s talk on KG gaps
• What are your pain points?
• Good competency question generation with adequate depth and breadth
• Good benchmark generation
• Use case generation in general 28

3. Knowledge Graph-
Centric Hybrid Architectures
29
What hybrid architectures should we explore?
Are Knowledge Graphs a new (provenance-aware)
notebook?

Evolving Architectures (DARPA)
30

Boston Fusion / RPI ECOLE Example
31
Automated Clustering CurriculumLearning
Guided by Human Training (ARCLIGHT)
Knowledge graph-centric
Knowledge graph includes symbolic and
neural representations
Agents can asynchronouslycontribute to
graph
PIs
Jay Miller (Systemintegration, multi-
disciplinary coalitions )
Tomek Strzalkowski (Cognitive sciences,
NLP, dialogue, AI)
Deborah McGuinness (Semantic
technologies and knowledge representation)
Qiang Ji (Computer vision and probabilistic
ML methods)

Whyis Knowledge Graph-based Orchestration
32
Whyis is a
knowledge graph
publishing,
management, and
analysis
framework
ARCLIGHT uses
Whyis as
orchestrator /
agent base
Agents make API
calls to ARCLIGHT
modules

MaterialsMine
Knowledge Graph
Curation Interaction
Augmentation
Materials Systems
• Polymer
nanocomposites
• Metamaterials
Information Types
• Composition metadata
• Processing metadata
• Properties
(mechanical, thermal,
electrical,
viscoelastic…)
• Characterization
metadata
• Images(SEM, TEM)
• Bibliographic
metadata
Data Storage
• MongoDB
• Blazegraph (Fuseki)
Search
• SPARQL endpoint
• Faceted browser
• Parameterized queries
Visualization
• Gallery of Interactive
Charts
• Vega-Lite Editor
• Data Voyager
• Extract microstructure descriptors (MCR)
• Chemical name classifier (ChemProps API)
• RDF data model
• OWL ontology
• ETL from Excel, XML
• Semantic data dictionary
Evolving Architectures (NSF
Harnessing the Data Revolution)
33
Partially supported by NSF ACI-1640840 Joint with Duke, RPI, Vermont, Northwestern, Caltech

Evolving Architectures (NIEHS
Human Health Exposure Analysis)
Provide access to
exposure and health
outcome data
Enable construction of
customized,
harmonizable datasets
Facilitate harmonization
of HHEAR data with
external sources
Support pooling of data
for analysis
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 McGuinness Joint with MSSM, Columbia
34

Data Portal
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Joint with MSSM, Columbia
McGuinness
35

Evolving Architectures (AI Horizons)
Health Empowerment by Analytics, Learning,
and Semantics
Personal Health Recommendations:
Empowering diabetic patients to improvetheir
health behaviors by tailoring health information
and recommendations to their lifestyles and
preferences.
Semantics for Health and Clinical Reasoning:
Enabling provider trust in clinical decision
support system recommendations.
Improving health equity through ML, Fairness,
and Semantics
https://idea.rpi.edu/research/projects/heals
36
Funded by IBMAI HorizonsNetwork

ProbabilityAwareKnowledge Exploration:DrugRepurposingExample
37
• Knowledge importedfromdrug, protein,
and disease interactiondatabases.
• Each interactiongivenan evidence-
drivenprobability.
• Finddrugs that could affectmelanoma,
filteredbyinteractionprobability.
• The best hypotheseswere generated
usingthe highestprobabilities.
• Beingexpandedinitiallyforgenomics-
aware cancer care applications
• Displaybelowshowinginteractive drug-
protein-disease interface

Lessons? Discussion
38
• KGs are increasingly
poised to be a key (prov-
aware) central repository
• KGs and Semantic Web
research may be poised
to be the context layer
• More exploration of
hybrid architectures is in
our future.. E.g., KG
augmented Retrieval
augmented generation
``(image from aws)

4. Emerging Policies / Requirements
and Semantic Web Research
39
How is emerging legislation poised to impact our work?
Is semantics for provenance, bias detection,
evaluation, reporting, analysis, .. Something to
explore (quickly) ? yes

From Data Protection (GDPR) to AI
Bill of Rights to AI Exec Order…
40
• Safe and Effective Systems (pre-deployment testing, mitigate unsafe outcomes)
• Algorithmic Discrimination Protections (equitable use of algorithms)
• Data Privacy (protections for data collection/use)
• Notice and Explanations (know when automated system used, how it impacts
you/use)
• Human Alternatives, Considerationand Fallback (opt out, remedies)

US Executive Order
41
• New Standards for AI Safety and Security (sharing safety
test results, develop standards, tools, detecting AI
generated content etc.)
• Protecting privacy (privacy preserving research, data
collection/ usage, …)
• …
• Ensuring Responsible and Effective Government Use of AI
(guidance for AI use, procurement, …)
• Extensive factsheet - https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-
executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/

Many Starting Points
42
• Companies such as Hugging Face have Model Cards with
light structure for reporting https://huggingface.co/docs/hub/model-card-annotated
• Ontologies for Model annotation, testing, reporting, etc. is
an opportunity
• A few relevant pointers:
• Explanation Ontology (explanation types for user-centric XAI)
https://tetherless-world.github.io/explanation-ontology/
• Ontology for Fairness Metrics
https://dl.acm.org/doi/abs/10.1145/3514094.3534137
• Factsheets, Quality Labels, …

Lessons? Discussion
43
• Impacts and needs from the emerging policies, mandates are
emerging
• Provenance, reporting, evaluation could/should be more than
just lightly structured natural language
• The Semantic Web community is well positioned to take a
leadership role here

Questions?
44
Many people contributedto the projectsand ideasmentioned, a few include: Jans Aasman (Franz Inc), the RPI TetherlessWorld team particularly McCusker, Erickson, Pinheiro,
Santos, Rashid, Franklin, Chari, Shirai, Seneviratne,Hendler, … and the broader community particularly: Chorpita, Kendall, Bennett, Qi,Zaki, Stingone,Masters,Teitelbaum, Brinson,
Schadler, Daegen,Chen, Sequeda…
Funding IBM-RPI HEALS AI Horizons, NIEHS HHEAR 0255-0236-4609/ 1U2CES026555-01, NSF HDR ACI-1640840, NIMD 75N95022C00018, DARPA MCS N660011924033,
The generative AI explosion could be the start of a
Renaissance for the Semantic Web community with a
unique opportunity to shine AND a time to rethink
1. Processes / Methodologies (e.g., update books?)
2. Build and use Sandboxes
3. Provenance-Aware Knowledge Graph-Centric Architectures
as repositories and context providers
4. Meta data infrastructure in response to emerging legislation

Threat AND Opportunity
From Gizmodo:
Artists, Customer Service
Agents, Copywriters,
Lawyers, Scientists,
Computer Programmers,
Influencers and Fashion
Models, Journalists ?
47
47
From Unmudl:
Replace: Data Entry, Telemarketer, Factory Worker,
Cashier, Driver, Travel Agent, Bank Teller
Transform: Accountant, Customer Support
Representative, Paralegal, Radiologist, Librarian,
Market Research, Journalist,
Augment: Retail Salesperson, Stock Trader, Human
Resources Administrator, Customer Service Rep,
Mortgage Underwriter, Translator, …

ISWC2023-McGuinnessTWC16x9FinalShort.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ISWC2023-McGuinnessTWC16x9FinalShort.pdf

Similar to ISWC2023-McGuinnessTWC16x9FinalShort.pdf (20)

More from Deborah McGuinness

More from Deborah McGuinness (11)

Recently uploaded

Recently uploaded (20)

ISWC2023-McGuinnessTWC16x9FinalShort.pdf