Keynote presentation for the International Semantic Web Conference in Athens Greece, on November 9, 2023. The talk addresses the generative AI explosion and its potential impacts on the Semantic Web and Knowledge Graph communities and, in fact, may spark a research Renaissance.
Abstract:
We are living in an age of rapidly advancing technology. History may view this period as one in which generative artificial intelligence is seen as reshaping the landscape and narrative of many technology-based fields of research and application. Times of disruptions often present both opportunities and challenges. We will discuss some areas that may be ripe for consideration in the field of Semantic Web research and semantically-enabled applications. Semantic Web research has historically focused on representation and reasoning and enabling interoperability of data and vocabularies. At the core are ontologies along with ontology-enabled (or ontology-compatible) knowledge stores such as knowledge graphs. Ontologies are often manually constructed using a process that (1) identifies existing best practice ontologies (and vocabularies) and (2) generates a plan for how to leverage these ontologies by aligning and augmenting them as needed to address requirements. While semi-automated techniques may help, there is typically a significant portion of the work that is often best done by humans with domain and ontology expertise. This is an opportune time to rethink how the field generates, evolves, maintains, and evaluates ontologies. We consider how hybrid approaches, i.e., those that leverage generative AI components along with more traditional knowledge representation and reasoning approaches to create improved processes. The effort to build a robust ontology that meets a use case can be large. Ontologies are not static however and they need to evolve along with knowledge evolution and expanded usage. There is potential for hybrid approaches to help identify gaps in ontologies and/or refine content. Further, ontologies need to be documented with term definitions and their provenance. Opportunities exist to consider semi-automated techniques for some types of documentation, provenance, and decision rationale capture for annotating ontologies. The area of human-AI collaboration for population and verification presents a wide range of areas of research collaboration and impact. Ontologies need to be populated with class and relationship content. Knowledge graphs and other knowledge stores need to be populated with instance data in order to be used for question answering and reasoning. Population of large knowledge graphs can be time consuming. Generative AI holds the promise to create candidate knowledge graphs that are compatible with the ontology schema. The knowledge graph should contain provenance information identifying how the content was populated and its source and correctness and currency should be checked. A human-AI assistant approach is presented.
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
1. Semantic Web Research in the Age of
Generative Artificial Intelligence:
Semantic Web Renaissance?
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professorof Computer, Cognitive, and Web Sciences and
Industrial and Systems Engineering
RensselaerPolytechnic Institute, Troy, NY, USA
November 9, 2023
International Semantic Web Conference
Athens, Greece
2. DLM Background: Ontologies, Knowledge Graphs,
Provenance, Knowledge Portals, Hybrid AI Solutions…)
Formal Foundations with application focus; often
hybrid, highly interdisciplinary, methodology focused
3. A Few Questions to Ponder
and 4 Areas of Interest
• Are we entering a Renaissance period for the semantic web? “an incredible
time of beauty, blossoming with creativity and curiosity”?
• How might the “Hype Cycle” perspective inform our work?
1. How might we leverage the evolving generative AI tools and services?
What kind of new collaborative processes may be useful?
2. How might we create/use “sandboxes” for exploration and evaluation?
3. What hybrid architectures should we explore?
Are Knowledge Graphs a new (provenance-aware) notebook?
4. How is emerging legislation poised to impact our work?
Is semantics for explanation, provenance, bias detection, evaluation,
reporting, analysis, .. something we should explore?
2
5. 1. Hybrid Processes
4
How might we leverage the evolving generative AI tools and
services?
What kind of collaborative processes may be useful?
6. Process Exploration
Collaborative KG Generation
What if we use
McGuinness’ and
Aasman’s explorations on
building a Wines Ontology
and the associated
Knowledge graph, without
and with LLMs as a
thought experiment
5
7. From Living with CLASSIC (1989) and
Ontologies Come of Age (2003)
McGuinness’ Base ontology is very simple – initially written in 1985 as an expert
system. Converted to a KL-ONE ontology in 1989
– Wine, Winery, Grape, Flavor, Body, Color, Sugar
– Stood the test of time: Living with Classic (1991), CLASSIC tutorials,
Ontologies 101(2001), OWL Guide (2004), … numerous translations, >8k
citations
– To scale however, needs to be compatible with WIDE range of menus, wine
lists, vocabularies. Not hard to obtain but significant enhancement required.
– Needs more ecosystemsupport – explanation, provenance, validation,
inconsistency detection, prioritization scheme, UI considerations, additional
social connections, citizen-oriented maintenance and evolution schemes,
scale, partitioning, …
www.ksl.stanford.edu/people/dlm/papers/living-with-classic-abstract.html
11. LLMs and refinement
10
• It appears that it is essentially
my ontology (from Living with
CLASSIC (1991 book from 89
workshop on Foundational
Aspects of Semantic
Networks) and the expanded
version of the Stanford Tech
report
• Jans wanted more information
about flavors
15. Human-LLM Collaboration
14
• Leverage ontologies as starting points
• Human driven refinement (example adding flavors)
• Good use of LLMs for population – both for value
restrictions (grape type per region) and instances (example
prominent Bordeaux wines)
• It continued to populate values (for flavors)
• (Some) validation (using e.g., webchatGPT)
• Overall nice use for GUIDED completion – more testing
needed
16. Lessons? Discussion
15
• LLMs can greatly enhance aspects of KG design/population
• **Having the (skilled) human collaborate with the LLM is likely
critical
• We may discuss other ways to leverage LLMs – e.g., we might
consider pain points in ontology design and ontology-enabled
application build out (e.g. Andrea Nuzzolese’s keynote in Deep
learning for KG workshop with competency questions)
• Designing and building collaborative processes I think is in our
future.
• Next “sandbox” for exploration
18. ChatBS: A Context-aware LLM Exploratory Sandbox
ChatBS version: 2023-10-04 15:52:00 (GPT-4-0613)
Submit questions
to OpenAI
Completion API:
Custom "System"
and "User" prompts
Up to 10 prompt
re-submissions
Up to 10 unique,
independent
responses to user's
prompt
Downloadable JSON
file with complete
promptsand results
https://inciteprojects.idea.rpi.edu/chatbs/app/chatbs/
ChatBS, the Context-aware LLM Exploratory Sandbox uses the OpenAI CompletionAPI
service (GPT-4-0613 model) to answer questions. Each sentence in a ChatBS result is
automatically linked to a Google query to facilate fact-checking. ChatBS can then use the
OpenAI API to construct an entity/relation graph of these results in the form['entity1',
'relationship', 'entity2']. ChatBSthen uses entity linking to look up entities and relationships
against Wikidata, constructing a JSON-LD graph as it proceeds.
RDF generation is
optional, to facilitate
question answering
Each sentence in
results is automatically
linked to a Google
query, proving a quick
"sanity-check"
Option to append
user promptsfor
"chain-of-thought"
reasoning
evaluation
Truncated table of triples,
relevant to OpenAI's
answers (assertions)
RDF generated by LLM-
based ChatBS-REST entity
linking service
Highlighting when Subject
or Object entitiesare bnodes
(ie when entitiesnamedby
ChatGPT are not found in
Wikidata)
Downloadable JSON-LD
of complete resultsgraph
Embedded
Wikidata entity
summary (using
AutoDesc API) of
selected entity…to
aid in human fact
checking!
Table of RDF triples,
relevant to OpenAI's
answers (assertions)
RDF generated by
LLM-based ChatBS-
REST entity linking
service
19. NOTE: ChatBS may have been updated sincethese screenshots werecaptured…
Each of these responses is
essentially correct, yet each is
distinct.
Thisdemonstrateswhy it is
important to submitquestions
multiple timeswhen studying
GTP-4-generated
"explanations."
20. NOTE: ChatBS may have been updated sincethese screenshots werecaptured…
Simple question,widely varying
(and all imprecise) answers!!
Correct answer: 1998 - 2007
23. NOTE: ChatBS may have been updated sincethese screenshots werecaptured…
Deborah'shonors and awards are
listed on her Wikipedia pageand
personal web site(s), but GPT-4's
training appearsnot to have caught
those… so it tap-dances!
24. Graph comparison
Generate
RDF
Potential Architecture for a
KG-powered Fact Checker
Example: A Wikidata-based
GPT fact checker
User prompt
(question)
OpenAI
* https://platform.openai.com/docs/api-reference/completions
continuation
API*
Continuation
Response
Submitted prompt
Relationship discovery and
entity linking service
GPT-asserted
relationships
(RDF triples)
Selected GPT model:
e.g. GPT-4-0613
Build labeled
entity graph
Link Entities
and Properties
labels
URIs
Supporting
semantics
(context)
Who is Deborah McGuinness?
Deborah McGuinness is an American
computer scientist…
Wikidata GPT
(reference) (test)
25. Assessment of commonsense proficiency in
conversational systems
● Context-aware evaluation
● Different problem solving modalities
● Presentation formats
● Research questions:
○ How does the above affect machine
problem solving?
○ What is the benefit of the human-in-
the-loop? Can we adapt it to human-
in-the-loop instructable systems?
Drive multi-hop machine reasoning,
provide feedback on improvements to the
dataset [2]
[1] Kejriwal, Santos, Mulvehill, McGuinness. 2022.
Designing a strong test for measuring true common-
sense reasoning. Nature Machine Intelligence.
[2] Kejriwal, Santos, Mulvehill, McGuinness. 2023.
Context-Rich Evaluationof Machine Common
Sense. Artificial General Intelligence (AGI-23).
24
Partially DARPA MCS
26. Lessons? Discussion
25
• Our Context-aware LLM exploratory sandbox is just one
example of an architecture
• Evaluation “Harnesses” may be an area ripe for
exploration.
• More sandboxes and evaluation harnesses may benefit
our field
27. What are unsolved problems in formal
ontology research? (by Chatgpt)
•
• Ontology alignment: Integrating multiple ontologies or aligning different parts of a single
ontology is a challenging task. Developing techniques to automatically align ontologies,
resolve semantic heterogeneity, and handle inconsistencies is …needed
• Ontology evolution: Ontologies are dynamic and need to evolve over time to reflect changes in
the domain they represent. Managing ontology evolution, including identifying and propagating
changes, maintaining consistency, and handling versioning and compatibility… is needed
• Ontology modularization: Large ontologies can become complex and difficult to manage.
Techniques for decomposing ontologies into smaller, modular components while preserving
important relationships and ensuring coherence are still an active area of research.
• Ontology evaluation: Evaluating the quality and effectiveness of ontologies is essential.
Developing objective metrics and evaluation frameworks to assess ontology quality, usability,
completeness, and correctness is an ongoing challenge.
26
28. What are unsolved problems in formal
ontology research? (by Chatgpt)
• Ontology learning and population: Automatically constructing ontologies from unstructured or
semi-structured data is …needed Techniques for ontology learning, including concept extraction,
relationship identification, and population of ontology instances … are active research areas.
• Reasoning scalability: Efficient reasoning is crucial for utilizing ontologies effectively. Developing
scalable reasoning algorithms that can handle large-scale ontologies and complex reasoning
tasks while maintaining acceptable performance is a significant challenge.
• Uncertainty and vagueness in ontologies: Ontologies often deal with imperfect or uncertain
information. Capturing and representing uncertainty and vagueness in ontologies and
developing reasoning techniques that can handle such knowledge …is needed.
• Ontology visualization and user interfaces: Presenting ontologies in a human-readable and
intuitive manner is essential for effective knowledge representation and user interaction.
Developing visualization techniques and user interfaces that facilitate ontology exploration,
editing, and understanding is an area of active research.
27
29. Brainstorming: Lets Leverage LLMs
• Ontology alignment
• Ontology evolution
• Ontology modularization
• Ontology evaluation
• Ontology learning and population:
• Reasoning scalability
• Uncertainty and vagueness in ontologies
• Ontology visualization and user interfaces
• Remember Gerhard’s talk on KG gaps
• What are your pain points?
• Good competency question generation with adequate depth and breadth
• Good benchmark generation
• Use case generation in general 28
30. 3. Knowledge Graph-
Centric Hybrid Architectures
29
What hybrid architectures should we explore?
Are Knowledge Graphs a new (provenance-aware)
notebook?
32. Boston Fusion / RPI ECOLE Example
31
Automated Clustering CurriculumLearning
Guided by Human Training (ARCLIGHT)
Knowledge graph-centric
Knowledge graph includes symbolic and
neural representations
Agents can asynchronouslycontribute to
graph
PIs
Jay Miller (Systemintegration, multi-
disciplinary coalitions )
Tomek Strzalkowski (Cognitive sciences,
NLP, dialogue, AI)
Deborah McGuinness (Semantic
technologies and knowledge representation)
Qiang Ji (Computer vision and probabilistic
ML methods)
33. Whyis Knowledge Graph-based Orchestration
32
Whyis is a
knowledge graph
publishing,
management, and
analysis
framework
ARCLIGHT uses
Whyis as
orchestrator /
agent base
Agents make API
calls to ARCLIGHT
modules
34. MaterialsMine
Knowledge Graph
Curation Interaction
Augmentation
Materials Systems
• Polymer
nanocomposites
• Metamaterials
Information Types
• Composition metadata
• Processing metadata
• Properties
(mechanical, thermal,
electrical,
viscoelastic…)
• Characterization
metadata
• Images(SEM, TEM)
• Bibliographic
metadata
Data Storage
• MongoDB
• Blazegraph (Fuseki)
Search
• SPARQL endpoint
• Faceted browser
• Parameterized queries
Visualization
• Gallery of Interactive
Charts
• Vega-Lite Editor
• Data Voyager
• Extract microstructure descriptors (MCR)
• Chemical name classifier (ChemProps API)
• RDF data model
• OWL ontology
• ETL from Excel, XML
• Semantic data dictionary
Evolving Architectures (NSF
Harnessing the Data Revolution)
33
Partially supported by NSF ACI-1640840 Joint with Duke, RPI, Vermont, Northwestern, Caltech
35. Evolving Architectures (NIEHS
Human Health Exposure Analysis)
Provide access to
exposure and health
outcome data
Enable construction of
customized,
harmonizable datasets
Facilitate harmonization
of HHEAR data with
external sources
Support pooling of data
for analysis
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 McGuinness Joint with MSSM, Columbia
34
37. Evolving Architectures (AI Horizons)
Health Empowerment by Analytics, Learning,
and Semantics
Personal Health Recommendations:
Empowering diabetic patients to improvetheir
health behaviors by tailoring health information
and recommendations to their lifestyles and
preferences.
Semantics for Health and Clinical Reasoning:
Enabling provider trust in clinical decision
support system recommendations.
Improving health equity through ML, Fairness,
and Semantics
https://idea.rpi.edu/research/projects/heals
36
Funded by IBMAI HorizonsNetwork
38. ProbabilityAwareKnowledge Exploration:DrugRepurposingExample
37
• Knowledge importedfromdrug, protein,
and disease interactiondatabases.
• Each interactiongivenan evidence-
drivenprobability.
• Finddrugs that could affectmelanoma,
filteredbyinteractionprobability.
• The best hypotheseswere generated
usingthe highestprobabilities.
• Beingexpandedinitiallyforgenomics-
aware cancer care applications
• Displaybelowshowinginteractive drug-
protein-disease interface
39. Lessons? Discussion
38
• KGs are increasingly
poised to be a key (prov-
aware) central repository
• KGs and Semantic Web
research may be poised
to be the context layer
• More exploration of
hybrid architectures is in
our future.. E.g., KG
augmented Retrieval
augmented generation
``(image from aws)
40. 4. Emerging Policies / Requirements
and Semantic Web Research
39
How is emerging legislation poised to impact our work?
Is semantics for provenance, bias detection,
evaluation, reporting, analysis, .. Something to
explore (quickly) ? yes
41. From Data Protection (GDPR) to AI
Bill of Rights to AI Exec Order…
40
• Safe and Effective Systems (pre-deployment testing, mitigate unsafe outcomes)
• Algorithmic Discrimination Protections (equitable use of algorithms)
• Data Privacy (protections for data collection/use)
• Notice and Explanations (know when automated system used, how it impacts
you/use)
• Human Alternatives, Considerationand Fallback (opt out, remedies)
42. US Executive Order
41
• New Standards for AI Safety and Security (sharing safety
test results, develop standards, tools, detecting AI
generated content etc.)
• Protecting privacy (privacy preserving research, data
collection/ usage, …)
• …
• Ensuring Responsible and Effective Government Use of AI
(guidance for AI use, procurement, …)
• Extensive factsheet - https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-
executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/
43. Many Starting Points
42
• Companies such as Hugging Face have Model Cards with
light structure for reporting https://huggingface.co/docs/hub/model-card-annotated
• Ontologies for Model annotation, testing, reporting, etc. is
an opportunity
• A few relevant pointers:
• Explanation Ontology (explanation types for user-centric XAI)
https://tetherless-world.github.io/explanation-ontology/
• Ontology for Fairness Metrics
https://dl.acm.org/doi/abs/10.1145/3514094.3534137
• Factsheets, Quality Labels, …
44. Lessons? Discussion
43
• Impacts and needs from the emerging policies, mandates are
emerging
• Provenance, reporting, evaluation could/should be more than
just lightly structured natural language
• The Semantic Web community is well positioned to take a
leadership role here
45. Questions?
44
Many people contributedto the projectsand ideasmentioned, a few include: Jans Aasman (Franz Inc), the RPI TetherlessWorld team particularly McCusker, Erickson, Pinheiro,
Santos, Rashid, Franklin, Chari, Shirai, Seneviratne,Hendler, … and the broader community particularly: Chorpita, Kendall, Bennett, Qi,Zaki, Stingone,Masters,Teitelbaum, Brinson,
Schadler, Daegen,Chen, Sequeda…
Funding IBM-RPI HEALS AI Horizons, NIEHS HHEAR 0255-0236-4609/ 1U2CES026555-01, NSF HDR ACI-1640840, NIMD 75N95022C00018, DARPA MCS N660011924033,
The generative AI explosion could be the start of a
Renaissance for the Semantic Web community with a
unique opportunity to shine AND a time to rethink
1. Processes / Methodologies (e.g., update books?)
2. Build and use Sandboxes
3. Provenance-Aware Knowledge Graph-Centric Architectures
as repositories and context providers
4. Meta data infrastructure in response to emerging legislation