Prof. Antonio Origlia – Università degli Studi di Napoli Federico II
Dip. di Ingegneria Elettrica e Tecnologie dell’Informazione & Centro di Ricerca URBAN/ECO
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
1. Il ruolo dei grafi
nell’AI Conversazionale Ibrida
2. The team
Dr. Maria Di Maro – Post-doc in computational linguistics, expertise in Common Ground
management and Clarification Requests
Martina Di Bratto – PhD Student in computational linguistics, expertise in argumentation-based
dialogue
Sabrina Mennella – PhD Student in computational linguistics, expertise in common sense
representation
Roberto Basile Giannini – Master student in computer science, working on Influence Diagrams for
Conversation Management
Danilo Esposito – Master student in computer science, working on instructions sequence generation
based on common sense reasoning
Marco Galantino – bachelor student in computer science, working on expressive speech synthesis
Yegor Napadystyy – bachelor student in computer science, working on Nvidia Audio2Face REST
control
3. Modelling language vs modelling
communication
Large Language Models can capture conversation dynamics but not the motives behind them
Being based on machine learning, they capture regularities and correlations but cannot model
causation
Predicting the most probable utterance, given what people usually say, and presenting it as the AI’s
position is wrong
Choosing to talk (or not to talk) serves purposes: language is not an abstract sequence of symbols to
be predicted
Being able to consider, if not understand, the consequences of its own actions is a necessary step for
machines to produce language with actual intents and avoid harmful content
Machine Learning is only one of many powerful tools to understand language and human behaviour
in general
4. Theoretical
Model
theoretical
framework built
upon the
observation of
communicative
behaviour patterns
Evaluation
on the field
deployment of the
computational
model in human
centered tasks for
evaluation purposes
Computational
Model
formal
representation of
the theoretical
model
03
01 02
Modelling language vs modelling
communication
5. It is a tale told by an idiot,
full of sound and fury,
signifying nothing.
Macbeth
8. Dialogue flow
ASR LLM TTS
ASR
Common
Ground
Representation
Decision
Making
LLM TTS
Generative AI is good for Natural Language Generation but not for
Dialogue Management
9. Illusion of intelligence
Braitenberg, V. (1986). Vehicles: Experiments in synthetic psychology. MIT press
Simple wired vehicles can lead humans to
attribute intelligent behaviour to machines while
only reactive capabilities are given
Machine Learning is very good at cleaning data
by recognising underlying patterns inside noisy
channels.
Are Neural Networks very complex Braitenberg
vehicles? May appear to be intelligent but truly
not have actual intentionality
[…] when we look at these machines or vehicles as if they were animals
in a natural environment ... we will be tempted, then, to use psychological
language in describing their behavior. And yet we know very well that
there is nothing in these vehicles that we have not put in ourselves
12. Conflict Search Graph
Di Maro, M., Origlia, A., & Cutugno, F. (2021). Cutting melted butter? Common Ground inconsistencies management in dialogue systems using graph databases. IJCoL. Italian Journal of Computational Linguistics, 7(7-1, 2), 157-190.
14. The ladder of causation
Pearl, J., & Mackenzie, D. (2018). The book of why: the new science of cause and effect. Basic books.
Understanding one’s own role in the world gives
meaning to producing linguistic acts
Machine learning based agents are passive
observers of the world and can only react to it, not
act in it
Modelling the effect that acting in the world
provides, including speaking, and confronting the
consequences allows to generate language for its
intended purpose
Retrospective reasoning is a fundamental ability
that relies on causal modelling capabilities
15. Doing
To say something is to do something; or in which by saying or in
saying somthing we are doing something
Austin (Language Philosophy - 1962)
The first cognitive ability, seeing or observation, is the
detection of regularities in our environment, and it is shared by
many animals as well as early humans before the Cognitive
Revolution.
The second ability, doing, stands for predicting the effect(s) of
deliberate alterations of the environment, and choosing among
these alterations to produce a desired outcome.
Usage of tools, provided they are designed for a purpose and
not just picked up by accident or copied from one’s ancestors,
could be taken as a sign of reaching this second level.
Pearl (Computer Science - 2018)
16. Language as a tool
I have a dream But it is not this day
When diplomacy ends, war begins
The first step in the development of taste is to be
willing to credit your own opinion
Blessed are those who are persecuted because of
righteousness
Does he look like a bitch?
18. The role of graphs
Good to represent data in a
human interpretable way
Backed by considerable math
Can be projected in numerical
spaces for the machine to work
with
A modern way to recover
inference engines
Support grounding and
explainability
19. Network analysis algorithms help studying
the problem from a mathematical point of
view.
The Hyperlink-Induced Topic Search
(HITS) algorithm scores nodes in terms of
Authority: how important the node is in the
network
Hub: how important are the node’s
relationships
Paglieri, F., & Castelfranchi, C. (2005). Revising beliefs through arguments: Bridging the gap between argumentation and belief revision in mas. Argumentation in multi-agent systems: First international workshop
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604-632.
The role of graphs
20. Text embeddings are numeric vectors that
capture the general topics found in textual
content (movie plots)
Node embeddings are numeric vectors that
capture the role of the node in the network,
in terms of neighbourhood
Weighted embeddings can be computed
using text similarities
Chen, H., Sultan, S. F., Tian, Y., Chen, M., & Skiena, S. (2019). Fast and accurate network embeddings via very sparse random projection. In Proceedings of the 28th ACM international conference on information and knowledge
management (pp. 399-408).
The role of graphs
21. Interpretability: a graph is uninterpretable if any of the graph patterns describing each of the
foreseen communication problems is activated. In these cases, a clarification request is
produced;
Completeness: an interpretable graph is incomplete if information needed to respond to the
user intent is missing. In these cases, a request for information is produced;
Coherence: a complete graph is incoherent if logical conflicts are found in the belief graph. In
these cases, the adequate disambiguation question is produced;
Stability: a coherent graph is unstable if there are open issues, like unanswered questions. In
these cases, a question answering strategy is activated;
Desirability: a stable graph is undesirable if it does not exhibit the goal pattern. In these cases,
the most useful dialog move to create the goal pattern is produced, like an exploration or
exploitation move.
The role of graphs
24. Decision making
Bayesian models take into
account all the data that is
needed to take a decision
Can consider machine learning
performance on test sets as
priors
Separate priors from evidence
Consider the utility of graph
configurations given the
possibility to take decisions
Where do the models come from?
26. Why (not) to speak?
Desire
The graph representing the situation I am part of is not what I want it to be. What is the
linguistic act that maximises the probability that the desired graph configuration will appear?
Competence
I am not able to confidently predict what is going to happen given the current graph
configuration. Can I ask a question to collect more information and improve my prediction
capabilities?
Curiosity
The current graph configuration probably contains missing information. Can I ask a question to
complete it?
Hypothesizing
Things went differently from what I expected. What would have been the graph configuration if
I did something different?
27. Illocutionary strength
Problem
complexity
Interpretability Incompleteness Incoherence Instability Undesirability
Unfilled core
FE
Perception
Syntactic
Lexical
Reference
Unfilled non-core
FE
Information
optionality
Belief
strenght
Answer
complexity
Structured QA
Entropy
Deliberate
Information
sharing
Explanation
RAG
Neutral conflict
CS conflict
-IAR conflict
+IAR conflict
Illocutionary forces
29. FANTASIA
Origlia, A., Cutugno, F., Rodà, A., Cosi, P., & Zmarich, C. (2019). FANTASIA: a framework for advanced natural tools and applications in social, interactive approaches. Multimedia Tools and Applications, 78(10), 13613-13648.
Origlia, A., Di Bratto, M., Di Maro, M., & Mennella, S. (2022). Developing Embodied Conversational Agents in the Unreal Engine: The FANTASIA Plugin. In Proceedings of the 30th ACM International Conference on
Multimedia (pp. 6950-6951).
31. The Metafamily – Bastian
2020 – 2023
Common ground inconsistencies
32. The Metafamily – Jason
2021 – 2023
User profiling and recommendation
33.
34. Desideratum Symbolic AI Statistical AI Markov
Random
Fields
Embedding Logical
Neural
Networks
Neural nets as universal solvers X X X
Allow specialised sub-units X X X X
Meta-learning/Multi-task X - X
Modular X - - X
Can use prior knowledge X X X
True reasoning X - - X
Variables X X X X
Symbol manipulation X ---
Generic models X X X X X
Causality - - ---
Planning capabilities X X -
Perception/reasoning blending - - X
Perform true NLU with novel
interpretation generation
-
Acquire knowledge through Natural
Language
---
Learn with less data ---
35. Conclusions
Modern AI provides great tools to project physical observations in n-dimensional numerical spaces, which
can represent the mind of a machine
A machine can only communicate with intention if it is given the chance to explore and learn about the
consequences of the action of speaking
Psychological theories about motivation and emotion provide drivers for the machine to evaluate what
speaking or not speaking implies
Research is too much oriented at depleting the mine of every technology that is presented. Lots of papers
showing the applications of new models and little research about the reasons why these models work
A commercial push towards brute force approaches is present: only the big players have the strength so
democratization is a problem
Computer science is maybe the most introspective among hard sciences. Developing machines that talk
can help model intelligence and build tools to stimulate thought through unbiased dialogues