Multimodality at Dialogue 2022 by DeepPavlov.pdf

[1] — Wahlster, W. SmartKom: foundations of multimodal dialogue systems / Wolfgang Wahlster. Springer, 2006. Vol. 12.
[2] — Raisamo, R. Multimodal Human-Computer Interaction: a constructive and empirical study / Roope Raisamo. Tampere University Press, 1999.
Relevant modalities [1]:
● Speech
● Graphical User Interfaces
● Gestures
● Facial Expressions
● Physical Interaction
● Biometrics
Also popular [2] is distinction into input and
output modalities

Some Scenarios:
AI Assistant in the smartspeaker
device (e.g., F2 Robot by Tim
Kotov)
People Detection
Object Detection
Attention Detection
Emotions
Gestures
Facial Expressions
AI Assistant in the robot (e.g.,
Amazon Astro, robot from Panov’s
Lab @ MIPT)
People Detection
Object Detection
Attention Detection
Emotions
Gestures
Facial Expressions
Physical Interaction
Task Execution
Interaction w/ AI assistant in the
chat (e.g., images, videos, audio
etc.)
Object Detection in Images/Videos
Event Detection in Images/Videos
Image Generation
Video Generation
Interaction w/ AI assistant in the
metaverse (e.g., in Minecraft)
User & NPC Detection
Object Detection
Attention Detection
Physical Interaction
Task Execution

▪ Aimless
• Bot isn’t aware of its own goals (dialog length,
user’s mood, understanding and addressing user’s
goals), and doesn’t take them into account
▪ Mostly Tactical
• Dialog Management is mostly single-turn-based (though
we give priority to multi-turn scenario-driven skills)
▪ Mostly Reactive
• Response to Dialog Acts is reactive
• Topic Switching is reactive
• Link_to is mostly random (unless we have manual
transitions)
▪ Mostly Selfless
• Little to no opinion is expressed by our bot in
conversations with users
▪ Mostly Careless
• Bot mostly doesn’t relate to the user’s mood or
discuss user’s emotions
▪ Goal-Aware
• Bot should be aware, and actively drive its goals (dialog length, user’s
mood, understanding and addressing user’s goals)
▪ Strategic
• Dialog Management should be focused on reaching Bot’s goals, foresee
every possible user’s step, and its each action should complement Bot’s
strategy
▪ Proactive
• Bot should know which Speech Function as a response is
appropriate to user’s Speech Function, and then pick the best one
to complement it’s strategy
• Topic Switching should be utilized by Bot from strategic
perspective
• Link_to should utilize relationships between entities within topics
and between topics, and should be used by Bot from strategic
perspective
▪ Be Opinionated
• Bot should be able to express its opinion, be able to explain it, and be
coherent (don’t contradict itself except in minor things)
▪ Be Caring
• Bot should relate to the user’s mood and be able to discuss user’s
emotions

In multi-turn conversation bot should plan strategically, across turns
Single-Turn Management is our tactics! To become strategic we need a
higher-level abstraction to act across turns.

Eggins and Martin
(1997)
Discourse structure patterns operate across turns: thus overtly interactional & sequential
Discourse Management is a basis for acting across turns, thus becoming
strategic

Eggins and Slade
(1997)
Speech Functions control Discourse:
Give
information
Demand
information
Speech Acts
Discourse Moves
Speech Function Example:
open:initiate:give_opinion

Eggins and Slade
(1997)
Speech Functions have hierarchy based on the role in Discourse:
move
open
attend Initiate
Give
Fact
opinion
Demand
Open
Fact
Opinion
Closed
Fact
Opinion
sustain
Continue
Monitor Prolong
Elaborate
Extend
Enhance
Append
Elaborate
Extend
Enhance
React
Respond
Support
Develop
Elaborate
Extend
Enhance
Engage Register Reply
Accept
Comply
Agree
Answer
Acknowledge
Affirm
Confront
Disengage Reply
Decline
Non-comply
Disagree
Withold
Disawow
Contradict
Rejoinder
Support
Track
Check
Confirm
Clarify
Probe
Response
Resolve
Repair
Acquiesce
Confront
Challenge
Detach
Rebound
Counter
Response
Unresolve
Refute
Re-challenge

Eggins and Slade
(1997)
Removing SFs we don’t have to classify from user’s utterances in Alexa Prize
move
open
attend Initiate
Give
Fact
opinion
Demand
Open
Fact
Opinion
Closed
Fact
Opinion
sustain
Continue
Monitor Prolong
Elaborate
Extend
Enhance
Append
Elaborate
Extend
Enhance
React
Respond
Support
Develop
Elaborate
Extend
Enhance
Accept
Comply
Agree
Answer
Acknowledge
Affirm
Confront
Disengage Reply
Decline
Non-comply
Disagree
Withold
Disawow
Contradict
Rejoinder
Support
Track
Check
Confirm
Clarify
Probe
Response
Resolve
Repair
Acquiesce
Confront
Challenge
Detach
Rebound
Counter
Response
Unresolve
Refute
Re-challenge

DeepPavlov (2021)
(removed SFs for commands, discussing physical goods, non-verbal, multiple people)
move
open
attend Initiate
Give
Fact
opinion
Demand
Open
Fact
Opinion
Closed
Fact
Opinion
sustain
Continue
Monitor Prolong
Elaborate
Extend
Enhance
React
Respond
Support
Develop
Elaborate
Extend
Enhance
Agree
Acknowledge
Affirm
Confront
Reply
Disagree
Disawow
Contradict
Rejoinder
Support
Track
Check
Confirm
Clarify
Probe
Response
Resolve
Confront
Challenge
Detach
Rebound
Counter
Response
Unresolve
Refute
Re-challenge

Example
Discourse is a combination of key entity (subject), related entities (w/ user & bot relation to them), topic(s)
Discourse #1
• Topics: Entertainment_Movies, Actors
• Key Entity (Subject): Science Fiction Movies
• Related Entities:
movies: Aliens, Terminator,
actors: Sigourney Weaver, Arnold Schwarznegger
Pros: We don’t limit ourselves to one topic (~10 topics as in CoBot DialogAct
Topics) but have flexibility within each topic cause one topic can have
myriads of entities to discuss. When what is discussed is too far from
Discourse, our/user’s move is a change to a new Discourse.
But: why should bot propose a change of a Discourse?

Level 1
Dialog Manager should act based on these 3 levels, where each higher-level influences lower level:
Dialog: Bot Goals
• Understand User Interests & Conversation Goal(s)
• Address User Goal(s)
• Prolong Conversation
• Keep or improve user’s mood
• Address Bot Interests
Discourse: Discourse Management
• Maintain existing or change Discourse
Conversation Turn: Speech Function Management
• Pick the most appropriate Speech Function within chosen Discourse
Level 2
Level 3

Multimodality at Dialogue 2022 by DeepPavlov.pdf

Multimodality at Dialogue 2022 by DeepPavlov.pdf

Recommended

Recommended

More Related Content

Similar to Multimodality at Dialogue 2022 by DeepPavlov.pdf

Similar to Multimodality at Dialogue 2022 by DeepPavlov.pdf (20)

More from Daniel Kornev

More from Daniel Kornev (18)

Recently uploaded

Recently uploaded (20)

Multimodality at Dialogue 2022 by DeepPavlov.pdf